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SCA7 GENE AND METHODS OF USE 

5 ABSTRACT OF THE INVENTION 

The piesem invention provides diagnostic methods of identifying 
individuals at risk and not at risk of developing spinocerebellar ataxia 
type 7, The present invention also provides for methods for identifying 
expanded repeats, and the DNA flanking the expanded repeats, from 
10 genomic DNA. 
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SCA7 GENE AND METHODS OF USE 

STATEMENT OF GOVERNMENT RIGHTS 
The present invention was made with government support under Grant No. 
5 5POT-NS3371 8-03, awarded by the National Institutes of Health. The Government 
has certain rights in this invention. 

BACKGROUND O? THE INVENTION 

Trinucleotide repeat expansions have been shown to be the mutational 
10 mechanism responsible for a growing number of diseases, including Fragile X mental 
retardation, spinobulbar muscular atrophy, myotonic dystrophy (DM), Huntington 
disease (HD), spinocerebellar ataxia (SCA) types 1,2, 3 and 6, dentatorubral 
pallidoiuysian atrophy and Friedreich's ataxia. A hallmark for most of these diseases 
is the presence of anticipation, or a decrease in the age of onset and increase in disease 
1 5 severity in consecutive generations due to the tendency for the unstable trinucleotide ^ 
repeat tract to lengthen when passed from one generation to the next (Warren, S .T. 
Sclence.ZIL 1374-1375 (1996))- 

In 1993, Schalling et al. (Nature Genetics, 4, 135-139 (1993)) developed the 
repeat expansion detection (RED) assay. RED is an elegant technique that detects 
20 potentially pathological trinucleotide repeat expansions without prior knowledge of 
chromosomal location or flanking DNA sequence. Human genomic DNA is used as a 
template for a two-step ligation cycling process that generates sequence specific 
[(CAG) n , (CGG) n> etc.] oligonucleotide multimexs when expanded trinucleotide 
sequences are present in the genome. The assay was originally developed to detect 
25 very large trinucleotide repeat expansions present in genomic DNA from patients with 
Myotonic Dystrophy (DM) and Fragile X syndrome (up to 2,000 repeats). Since that 
time, Lindblad et al. have modified the procedure to detect smaller trinucleotide 
repeats in the stee range (40-100 CAG repeats) pathologic fox SCAl, SCA3, HD. and 
SBMA (Lindblad, KL, et al., Nature Genetics 1 124 (1994), Lindblad, K. et aL, 
30 Genome Research, 6, 965-971 (1996)). 

This modified assay has been used to establish correlations that suggest the 
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involvement of CAG expansions in diseases such as SCA7 (Lindblad, K. et aU 
Geftome Research, & 965-971 (1996)), bipolar affective disorder (Oruc, L. et al M /tm J 
Hum Genet., fiQ, 732-735 (1997)) and schizophrenia (Maraganore, D.M., et al„ 
Neurology, $L 0996)). 
5 The spinocerebellar ataxias (SCAs) are progressive degenerative neurological 

diseases of the nervous system characterized by a progressive degeneration of neurons 
of the cerebellar cortex. Degeneration is also seen in the deep cerebellar nuclei, brain 
p stem, and spinal cord. Clinically, affected individuals suffer from severe ataxia and 

dysarthria, as well as from variable degrees of motor disturbance and neuropathy. The 
1 0 disease usually results in complete disability and eventually In death 1 0 to 30 years 
after onset of symptoms. The genes for SC A types 1 , 2, 3 and 6 have been identified. 
All contain CAG DNA repeats that cause the disease when the repeat region is 
expanded. Little is known how CAG repeat expansion and elongation of 
polyghitamine tracts relate to neurodegeneration. The identification of the SCA7 gene 
1 S would provide an oprxtftuniry to study this phenomenon in a new protein system. 

The significance of identifying ataxia genes provides an improved method for 
diagnosis of individuals with the disease and increases the possibility of 
prenataVpresymptomatic diagnosis or better classification of ataxias. Most of the 
genes associated with repeat expansions in the coding region including the other SC A 
20 genes now identified, show no homology to known genes. 

SUMMARY OF THE INVENTION 

The present invention relates to methods for identifying individuals at risk and 
individuals noi at risk for developing spinocerebellar ataxia type 7. These methods 

25 include the step of analyzing the CAG repeat region of a spinocerebellar ataxia type 7 
gene wherein individuals at risk for developing spinocerebellar ataxia type 7 typically 
have at least about 30, more typically at least about 37 and even more typically at least 
about 38 CAG repeats. A person not at risk typically has less than about 1 9, more 
typically less than about 15, and even more typically less than about 5 CAG repeats. 

30 The methods can include the steps of performing a polymerase chain reaction with 
oligonucleotide primers capable of amplifying the CAG repeat region located within 
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the spinocerebellar ataxia type 7 gene, and detecting amplified DNA fragments 
containing the CAG repeat region. The oligonucleotide primers can be selected from 
the nucleotide region of SEQ ID NO:9 and from the region of SEQ ID NO:10. 
Preferred oligonucleotide primers are SEQ ID NO:5 and SEQ ID NO:6. 
5 The methods for identifying individuals at risk for developing spinocerebellar 

ataxia type 7 can also include detecting the presence of a DNA molecule containing a 
CAG repeat region of the SCA7 gene by probing genomic DNA digested whh a 
restriction endonuclease and probing the DNA fragments under hybridizing 
conditions with a detestably labeled gene probe so as to detect a nucleic acid molecule 
1 0 containing a CAG repeat region of an isolated SC A7 gene. 

The present invention provides isolated nucleic acids encoding the human 
SCA7 protein and portions thereof and isolated proteins and portions thereof encoded 
from the nucleic acid. The invention also relates to isolated DNA fragments, vectors 
and isolated recombinant vectors containing the nucleic acids of this invention, 
15 oligonucleotide probes arid primers that hybridize the SCA7 nucleic acid, host cells 
transformed or transfected with SCA7 or fragments thereof, compositions containing 
antibodies that specifically bind to polypeptides encoded by all or part of the SCA7 
nucleic acid, a method for detecting the SCA7 disorder including using a biological 
sample to form an antibody-antigen complex, and to model systems that express the 

20 SCA7 protein. 

7 The present invention provides for a kit for detecting whether or not an 

individual is at risk for developing spinocerebellar ataxia type 7. One preferred kit 
includes oligonucleotides selected from the nucleotide region of SEQ ID NO:9 and 

from the region of SEQ ID NO:10. 

25 In another aspect of this invention, the invention relates to a procedure for 

rapidly identirying and isolating expanded repeats and the corresponding flanking 
nucleotide sequence directly from small amounts of genomic DNA using a process of 
Repeat Analysis, Pooled Isolation, and Detection of individual clones containing 
expanded repeats (RAPID cloning). The method includes the steps of fractionating a 

30 population of DNA fragments and detecting the fraction that contains an expanded 
repeat, cloning the DNA fragments contained in the fraction of DNA that contains an 
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expanded repeat, and identifying the clones that contain the expanded repeat. The 
fractionation step can include digesting genomic DNA with a restriction enzyme to 
obtain DNA fragments, resolving the DNA fragments by gel electrophoresis, dividing 
into fractions on the basis of size, and detecting the presence of an expanded repeal in 

5 each size fraction. The nucleotide sequence flanking the expanded repeat can then be 
deterrnined and used to design a PCR assay to determine if a particular repeat 
congregates with a given disease. 

The invention also relates to an improvement of the repeat expansion detection 
assay where the rate of temperature change from the denaturation temperature is 

1 0 decreased and wherein the ligation buffer contains fbrmamide. Preferably* the rate of 
temperature change from the denaturation temperature is decreased to 2 seconds per 
degree and the ligation butler contains 4% forrnamide. 

BRIEF DESCRIPTION OF THE FIGURES 

X5 Fig, l Schematic overview of RAPID cloning of expanded trinucleotide 

repeats from genomic DNA and cDNA. In general, the Repeat Expansion Detection 
(RED) assay is used to follow an expanded trinucleotide repeat present in either 
genomic DNA or cDNA through a series of enrichment steps until a single, purified 
clone is obtained. Genomic DNA is digested with a restriction enzyme, the fragments 

20 are size-fractionated with agarose gel electrophoresis, and RED analysis is performed 
on me size fractions to determine which contains an expanded CAG repeat (Fig. 3). 
The RED positive fraction is cloned, DNA from clone pools consisting of 
approximately 5xl0 4 clones each are assayed, and a RED positive clone pool is 
subjected to a post-cloning CAG-enrichment procedure (Fig. 5). Clones from the 

25 enriched library are then assayed either individually or in small pools of 

approximately 20 clones each to detemune which clones contain the C AG-expansion. 

Fig. 2 RED analysis of genomic DNA control samples. A: An optirnized RED 
procedure was performed on genomic DNA from individuals with CAG expansions of 
known size. The size of the SCA1 (47 CAG repeats), SCA3 (71 repeats), HD (66 

30 repeats) and DM (80 repeats) alleles as measured by PCR assays is shown below the 
phosphoimage panels. The size of the RED products are indicated at the side of the 
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panel, and the anneal temperature used for the ligation reaction is indicated above, For 
each genomic sample assayed, the largest RED product corresponds appropriately 
with the size of the known expanded CAG allele. B: The RED assay was repeated for 
the SCA3, HD, and DM genomic samples in a reaction buffer that did not contain 

5 formarnide but was otherwise identical to that used for panel a A" 

Fig. 3 Two-dimensional RED (2D-RED) analysis. A: Schematic overview of 
the 2D-RED procedure. Genomic DNA is digested with a restriction enzyme and size- 
separated on an agarose gel. The lane containing the DNA is excised and uniformly 
cut every 2mm along its length using a gel-slicing device. Each slice is placed in a 

1 0 separate eppendorf tube, the agarose is digested, and the DNA is precipitated and 
tesuspended in a small volume of buffer. RED analysis is then performed on 
individual size fractions. In the example shown schematically, a genomic DNA 
sample that generates a RED70 product (left) is resolved by 2D -RED into separate 
RED40 and RED70 size fractions (right). B: Genomic DNA from an individual with 

1 5 a known SCA3 expansion was digested with Mbol and sizc-fracti onated. The size 
distribution of the critical fractions as measured by Tiuming a portion on an agarose 
gel is shown, C: RED analysis of the size fractions shown in panel **B". Fractions 3 
and 4 generate the RED70 product expected for the expanded SCA3 allele present in 
the original genomic sample. 

20 Fig. 4 RED analysis of genomic DNA samples from the MN1 kindred. The 

RED products generated from one of the affected individuals and from eight spouse 
samples are shown. 

Fig. 5 Cloning and post-cloning enrichment of an expanded CAG repeat from 
the MN1 kindred. A: RED analysis of plasmid DNA isolated from unenriched clone 

25 pools . The 2D-RED size-fractionated genomic DNA containing the long MN 1 CAG 
repeat was cloned into a lambda vector. The resulting library was amplified in pools 
consisting of approximately 5x10^ primary clones and then converted en masse into 
plasmid library pools. B: Schematic overview of the post-cloning enrichment 
procedure. A dsDNA library is converted to an ssDNA library in which uracil has 

30 been incorporated into the DNA strand. A (CAG)] q oligo (SEQ ID NO: 4) is used to 
prime second-strand synthesis in those clones that contain a long CAG repeat. (Jracil- 
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DNA glycosylase (UDG) is then added to remove the uracil residues from the original 
DNA strand. Transformation of this mixture into £ coli results in the repair and 
replication of CAG-containing dsDNA clones and in the degradation and elimination 
of the ssDNA background. C: RED analysis of CAG-cnriched clone pools derived 
5 from primary pools 3 and 8 (panel B). Each pool contains DNA from 20 individual 
clones. D: Nucleotide sequence of the genomic DNA flanking the MN1 CAG repeat 
The CAG repeat (underlined) varies from 1 1 to 24 repeats in individuals from the 
MN1 kindred that do not have an expansion, and has over 100 repeats in individuals 
with an expansion. 

10 Fig. 6 Pedigree of the MN1 kindred. The number of CAG repeats in each 

allele of the MN1 CAG repeat sequence is indicated numerically, and the individual 
from whom the expanded CAG was isolated is starred. 

Fig. 7 RAPID cloning of the SCA7 expanded CAG repeat. A: 2D-RED 
analysis of EcoRI-digested genomic DNA isolated from an individual with an 

15 autosomal dominant ataxia with rentinopalhy (individual Al, Fig. 8). The genomic 
DNA size-fraction containing the CAG expansion (indicated by *) was cloned into a 
lambda vector. The resulting library was amplified in pools that were converted en 

masse into plasmid library pools. B: RED analysis of CTG-enriched clone pools 
derived from a RED-pOsitive primary clone pool. Each poo! contains DNA from 36 
20 individual clones. RED analysis of plasmid DNA from the individual clones in pool 4 
identified two clones containing the expanded CAG repeat. C: Nucleotide sequence of 
the genomic DNA flanking the SCA7 expansion in clone 4-2. The CAG expansion is 
underlined. 

Fig. 8 PCR analysis of the SCA7 CAG alleles in kindreds diagnosed with 
25 autosomal dominant ataxia with retinopathy. The estimated age of onset (in 
parentheses) and number of CAG repeats in the SCA7 expansion is indicated 
numerically in each kindred. The individual from whom the expanded CAG was 
isolated is staired (A 1). Agarose gel analysis of the PCR products generated from 
genomic DNA of the indicated individuals is shown in the inset. An expanded allele ' 
30 was present only in affected or at-risk individuals, and the size of the expansion is 
inversely proportional to the age of onset. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The hereditary ataxias are a complex group of neurodegenerative disorders 
all characterized by varying abnormalities of balance attributed to dysfunction or 
pathology of the cerebellum and cerebellar pathways. In many of these disorders, 

5 dysfunction or structural abnormalities extend beyond the cerebellum and may 
involve basal ganglia function, oculo-motor disorders and neuropathy. Substantial 
efforts have been made to determine the genetic bases of the spinocerebellar ataxias 
(SCAs). SCA genes have been identified on different chromosomes and have 
different conserved sequences. While alf contain CAG DNA repeats that are 

10 associated with disease when the repeat is expanded, the underlying mechanism 

leading to neurodegeneration is unknown. Moreover, the high phenotypic variability 
within single SCA pedigrees has made clinical classification of different forms of 
ataxia difficult. 

The gene for SCA type 7 has been identified and isolated. The isolation of 
15 the SCA7 gene allows the easy diagnosis of one type of the spinocerebellar ataxias. 
Diagnosis can be the presymptomaric identification of individuals at risk of ataxia, 
including the identification of individuals where there is no family history of the 
disease. 

In one aspect of this invention, a method for identifying genes with 
20 expanded repeats is provided. As used herein, "expanded repeat" or "repeat 

expansion" refers to a single short repeating unit of nucleotides. The repeating unit 
can typically include between 2 and 8 nucleotides. These repeats can present in the 
coding sequence of genes that encode polypeptides that function properly. In some 
individuals the number of repeats increases or expands in number and translation of 
25 the gene containing an expanded repeat can result in a polypeptide than causes 
disease. A repeat is considered to be an expanded repeat when the number of 
consecutive repeats is associated with disease. An expanded repeal includes repeats 
of 2 nucleotides (dinucleotide repeat), 3 nucleotides (trinucleotide repeat), etc., up 
to and including repeats of 8 nucleotides, for example. 
30 Any gene containing an expanded repeat can be identified by this method. 

Preferably, a gene identified by this method will contain a trinucleotide repeat, 
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Including for instance genes for SCA types 1, 2, 3, 6 and 7. Utilization of different 
oligonucleotides allows any of the 10 possible trinucleotide repeats to be detected 
(Lindblad, K., et al.. Nature Genetics 2, 124 (1994)). Preferably, the CAG repeat 
is identified. Preferably, the CAG repeat is present in the SCA7 gene or in the 
5 gene comprising the nucleotides ofSEQ ID NO :3. This method first optimizes a 
repeat expansion detection assay and provides methods for enriching and isolating 
DNA containing expanded CAG repeats and flanking DNA. Preferably, the 
number of CAG repeats is greater than 20/ More preferably, the number of CAG 
repeats is greater than 30. 

10 One method to detect expanded repeats is the repeat expansion detection 

(RED) assay. RED analysis has been performed on a several of these patient 
populations and has led to reports of correlations between trinucleotide expansions 
and disease. However, RED analysis with defined genomic DNA control templates 
yields inconsistent results and typically does not correlate with size of the largest 

1 5 known CAG expansion in the genomic sample. Another limitation of RED analysis 
has been that although it can detect novel trinucleotide expansions, RED alone cannot 
directly determine if an expanded repeat causes disease or is merely one of a number 
of background repeat expansions found in the general population. Consequently* the 
role of an expanded trinucleotide repeat as a possible pathogenic mutation in a disease 

20 kindred must be conclusively confirmed. This requires the isolation of the expansions 
present in these populations and detailed PCR analysis to assess whether of not the 
expanded trinucleotide repeat cosegregates with the disease. 

A. Trinucleotide Repeat Expansions and Method of Diagnosis 
25 The identification of an improved method to identify trinucleotide repeat 

expansions associated with a disease allows for improved diagnosis of the disease. 
Thus, the present invention relates to methods of diagnosing individuals at risk of 
developing diseases that are caused by a trinucleotide repeat expansion. The 
invention also relates to methods of diagnosing individuals not at risk. These 
30 diagnostic methods can be used to identify individuals at risk of developing 

spinocerebellar ataxia type 1, 2, 3, 6 or 7 by analyzing the trinucleotide repeat region 
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Of a gene. Preferably, the CAG repeat is identified. Preferably, the CAG repeat is 
present in the SCA7 gene. The SCA7 gene of an individual not at risk of developing 
spinocerebellar ataxia type 7 typically contains less than about 19, more typically less 
than about 15, and even more typically less than about 5 CAG repeats. The SCA7 
5 gene of an individual at risk of developing spinocerebellar ataxia type 7 typically 
contains at least about 30, more typically at least about 37 and even more typically at 

least about 3 8 CAG repeats. 

These diagnostic methods can involve any known method for detecting a 
specific fragment of DNA. These methods can include direct detection of the DN A or 
10 indirect through detection through the detection of RNA or proteins, for example. For 
example, RED analysis can be used. Alternatively, Southern or Northern blotting 
hybridization techniques using labeled probes can be used. PCR techniques can be 
used with novel primers that amplify the CAG repeating region of a gene corrtaining a 
trinucleotide repeat expansion. Nucleic acid sequencing can also be used as a direct 
15 method of determining the number of trinucleotide repeats. 

As used herein, "hybridizes," "hybridizing" and "hybridization" means that the 
oligonucleotide forms a noncovalent interaction with the stringency target nucleic acid 
molecule under standard conditions. The hybridizing oligonucleotide may contain 
nonhybridizing nucleotides that do not interfere with forming the noncovalent 
20 interaction, e.g., a restriction enzyme recognition site to facilitate cloning. 

The gene for SCA7 contains a highly polymorphic CAG repeat that is located 
within a 1 .86 kb fragment produced by digestion of the candidate region with the 
restriction enzyme, EcttilL The CAG repeat region preferably lies within the coding 
region and codes for polyglutaminc. This region of CAG repeating sequences is 
25 unstable and expanded in individuals with SCA7. PCR analysis of the (CAG), repeat, 
for example, demonstrates a correlation between the size of the repeat expansion and 
the age-at-onset of SCA7 and severity of the disorder. That is, individuals with more 
repeat units (or longer repeat tracts) tend to have both an early age of onset and a more 
severe disease course. These results demonstrate that SCA7, like hereditary ataxia 
30 associated with SCAl, fragile X syndrome, myotonic dystrophy, X-linked spinobulbar 
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muscular atrophy, and Huntington disease, displays a mutational mechanism 
involving expansion of an unstable trinucleotide repeat. 

In one embodiment of the present invention, DNA probes can be used for 
identifying DNA segments of the affected allele of the SCA7 gene. DNA probes are 

5 segments of labeled, single-stranded DNA which will hybridize, or noncovalently 
bind, with complementary single-stranded DNA derived from the gene sought to be 
identified. The probe can be labeled with any suitable label known to those skilled in 
the art, including radioactive and nonradioactive labels. Typical radioactive labels 
include M P, l25 l, M S, and the like. Nonradioactive labels include, for example, ligands 

1 0 such as biotin or digoxigenin as well as enzymes such as phosphatase or peroxidases, 
or the various chemiluminescers such as lueiferin, or fluorescent compounds like 
fluorescein and its derivatives. The probe may also be labeled at both ends with 
different types of label 5 for ease of separation, as, for example, by using an isotopic 
label at one end and a biotin label at the other end. 

1 5 The present invention relates to a method for detecting the presence of DNA 

molecules containing a CAG repeat region where a sample of genomic DNA is 
digested with a restriction endonuclease, and probing the resulting DNA fragments 
with an oligonucleotide probe. Using DNA probe analysis, the target DNA can be 
derived by the enzymatic digestion, fractionation, and denaturation of genomic DNA 

20 to yield a complex mixture incorporating the DNA from many different genes, 

including DNA from the short arm of chromosome 3, which includes the SCA7 locus. 
A specific DNA gene probe will hybridize only with DNA derived from its target 
gene or gene fragment, and the resultant complex can be isolated and identified by 
techniques known in the art. In one embodiment, the method involves digesting 

25 genomic DNA with a restriction endonuclease to obtain DNA fragments, probing the 
fragments under hybridizing conditions with a detectably labeled gene probe, which 
hybridizes to a nucleic acid molecule chaining a CAG repeat region of an isolated 
SCA7 gene having at least about 1 1 nucleotides, detecting probe DNA which has 
hybridized to the DNA fragments, and analyzing the DNA fragments for a CAG 

30 repeat region characteristic of the normal or affected forms of the SCA7 gene. 
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Hie probes arc oligonucleotides, either synthetic or naturally occurring, 
capable of hybridizing to the region of the DNA sequence flanking the CAG repeating 
trinucleotides and optionally hybridizing to the DNA sequence encoding the CAG 
repeat Preferably, the probes hybridize to tie the SCA7 locus of the short arm of 

5 chromosome 3. The probe includes a nucleotide sequence substantially 

complementary to a portion of a strand of an atTected or a normal allele of a fragment 
(preferably a 1.86 kb EcoKl fragment) of an SCA7 gene having a(CAG)„ region. The 
probe sequence has at least about 1 1 nucleotides, preferably at least 1 5 nucleotides 
and no more than about 35 nucleotides. The probes are chosen such that the 

10 nucleotide sequence is substantially complementary to a portion of a strand of an 
affected or anorraal allele within about 1000 nucleotides 5' of the (CAG)„ region, 
including directly adjacent to the (CAG)„ region. Alternatively, the probes are chosen 
such that nucleotide sequence is substantially complementary to a portion of a strand 
of an affected or anormal allele within about 800 nucleotides 3' of the (CAG). region, 

]5 including directly adjacent to the (CAG)„ region. The probes can also comprise at 
least 15 nucleotides from SEQ ID NO:9, SEQ ID NO:10. SEQ ID NO:13 or SEQ ID 

NO:14. ' ■■ 

In general, for detecting the presence of a DNA sequence located within the 

SCA7 gene, the genomic DNA is digested with a restriction endonuclease to obtain 
20 DNA fragments. The source of genomic DNA to be tested can be any biological 

specimen that contains DNA. Examples include specimens of blood, semen, vaginal 
swabs, tissue, hair, and body fluids. The restriction endonuclease can be any that will 
cut the genomic DNA into fragments of double-stranded DNA having a particular 
nucleotide sequence. The specificities of numerous endonucleases are well known and 
25 can be found in a variety of publications, e.g. Sambroolc et a!.; Molecular Cloning: A 
Laboratory Manuel; Cold Spring Harbor Laboratory: New York (1989). Preferred 
restriction endonuclease enzymes include EcoU Taql. and SrtNI. EcoBl is 

particularly preferred. 

Diagnosis of the disease can alternatively involve the use of the polymerase 
30 chain reaction sequence amplification method (PCR) using novel primers. U.S. 
Patent No. 4,683,195 (Mullis et al., issued July 28, 1987) describes a process for 
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amplifying, detecting and/or cloning nucleic acid sequences. This method involves 
treating extracted DNA to foim single-stranded complementary strands, treating the 
separate complementary strands of DNA with a molar excess of two oligonucleotide 
primers, extending the primers to form complementary extension products that act as 
5 templates for synthesizing the desired nucleic acid molecule, detecting the amplified 
DNA molecule, and analyzing the amplified molecule for a CAG repeat region 
characteristic of the SCA7 disorder. More specifically, the method steps of treating 
the DNA with primers and extending the primers include the steps of: adding a pair 
of oligonucleotide primers, wherein one primer of the pair is substantially 
10 complementary to part of the sequence in the sense strand and the other primer of each 
pair is substantially complementary to a different part of the same sequence in the 
complementary antisense strand; annealing the paired primers to the complementary 
molecule; simultaneously extending the annealed primers from a 3' terminus of each 
primer to synthesize an extension product complementary to the strands annealed to 
1 5 each primer wherein said extension products after separation from the complement 
serve as templates for the synthesis of an extension product for the other primer of 
each pair, and separating said extension products from said templates to produce 
single-stranded molecules. Variations of the method ore described in U.S. Patent No. 
4,683,194 (Saiki et a!., issued July 28, 1987). The polymerase chain reaction 
20 sequence amplification method is also described by Saiki et al., Science, 230, 
1350-1354 (1985) and Scharf et aL, Science, 324, 163-166 (19*6). 

As used herein, the term "amplified DNA molecule" and "amplified DNA 
fragment" refers to DNA molecules that are copies of a portion of DNA and its 
complementary sequence. The copies correspond in nucleotide sequence to the 
25 original DNA sequence and its complementary sequence. The term "complement* 
and "complementary* as used herein, refers to a DNA sequence that is 
complementary (having greater than 65% homology) to a specified DNA sequence. 
The term "primer pair", as used herein, means a set of primers including a 5' upstream 
primer that hybridizes with the 5* end of the DNA molecule to be amplified and a 3' 
30 downstream primer that hybridizes with the complement of the 3* end of the molecule 
to be amplified. 
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The primers are oligonucleotides, either synthetic or naturally occurring, 
capable of acting as a point of initiating synthesis of a product complementary to the 
region of the DNA sequence containing the CAG repeating trinucleotides of the SCA7 
locus of the short arm of chromosome 3. The primer includes a nucleotide sequence 
5 substantially complementary to a portion of a strand of an affected or a normal allele 
of a fragment (preferably a 1 .86 kb EcoEI fragment) of an SCA7 gene having a 
(CAG)„ region. The primer sequence has at least about 1 1 nucleotides, and 
preferably, at least about 16 nucleotides antfno more than about 35 nucleotides. The 
primers are chosen such that they produce a primed product of about 70-350 base 

1 0 pairs, preferably about 100-300 base pairs. More preferably, the primers are chosen 
such that nucleotide sequence is substantially complementary to a portion of a strand 
of an affected or a normal allele within about 150 nucleotides on either side of the 
(CAG) t region, including directly adjacent to the (CAG)„ region. 

The invention discloses conserved regions flanking the CAG repeat region of 

1 5 SCA7. Oligonucleotides suitable for polymerase chain reaction amplification can be 
selected from the regions flanking the CAG repeat region both 5 7 and 3* to the CAG 
repeat region. The regions of the SCA7 gene from which oligonucleotide primers can 
be selected are from the nucleotides of SEQ ID NO:9 and SEQ ID NO: 10. Preferred 
primers are SEQ ID NO:5 and SEQ ID NO:6. This primer set successfully amplifies 

20 the CAG repeat units of interest using PCR technology. These oligonucleotides are 
useful for amplifying the CAG repeat region from the SCA7 gene from DNA taken 
from an individual suspected of having spinocerebellar ataxia. Oligonucleotide 
primers can also be selected from the nucleotides of SEQ ID NO: 13 and SEQ ID 
NO: 1 4. The amplified fragments can be run on a gel to detect the length of the CAG 

25 repeat region. Individuals at risk for developing spinocerebellar ataxia type 7 
typically have at least about 30, more typically at least about 37 and even more 
typically at least about 38 CAG repeats, A person not at risk typically has less than 
about 19, more typically less than about 1 5, and even more typically less than about 5 
CAG repeats. Alternatively, the primer pair can be used in various known techniques 

30 to sequence the SCA7 gene. 
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The invention also relates to a kit for detecting whether or not an individual is 
at risk for developing the disease associated with an expanded repeat. The method 
used in the kit for detecting whether or not an individual is at risk for developing the 
disease associated with on expanded repeat includes all methods for detecting an 

5 extended repeat disclosed herein. Preferably, the expanded repeat detected is CAG. 
. Preferably, the CAG repeat is present in the SCA7 gene. 

As stated previously, other methods of diagnosis can be used as well. They 
can be based on the isolation and identification of the repeat region of genomic DNA 
(CAG repeat region), cDNA (CAG repeat region), mRNA (CAG repeat region), and 

10 protein products (glutamine repeat region). These include, for example, using a 
variety of electrophoresis techniques to detect slight changes in the nucleotide 
sequence of the SCA7 gene. Further nordimiting examples include denaturing 
gradient electrophoresis, single strand conformational polymorphism gels, and 
nondenaturing gel electrophoresis techniques. 

1 5 The mapping and cloning of the SCA7 gene allows the definitive diagnosis of 

one type of the dominantly inherited ataxias using a simple blood test This represents 
the first step towards an unequivocal molecular classification of the dominant ataxias. 
A simple and reliable classification system for the ataxias is important because the 
clinical symptoms overlap extensively between the SCA7 and the non-SCA7 forms of 

20 the disease. Furthermore, a molecular test for the only known SCA7 mutation permits 
presymptomatic diagnosis of disease in known SCA7 families and allows for the 
identification of sporadic or isolated CAG repeat expansions where there is no family 
history of the disease. Thus, the present invention can be used in family counseling, 
planning medical treatment, and in standard work-vps of patients with ataxia of 

25 unknown etiology. 

R Identification of Expanded Repeats fm m Genomic DNA 

One aspect of this invention relates to an improved method of performing 
RED analysis to evaluate the presence and size of expanded repeats in a DNA sample. 
3 0 Another aspect of this invention relates to a method (referred to as 2D-RED) 

of identifying a genomic size fraction wherein the fraction is enriched for DNA 
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fragments that contain an expanded repeat. While it is understood that this method is 
not limited to the identification of expanded trinucleotide repeats, for purposes of this 
discussion only expanded trinucleotide repeat will be considered. When RED 
analysis is performed on a genomic DNA sample that contains multiple trinucleotide 
5 expansions, the RED ladder generated from each expansion is superimposed oft one 
another and the size of the largest ligation product corresponds with the approximate 
siw of the largest expansion. In 2D-RJED analysis the multiple expansions present in a 
genomic sample are physically separated frdm one another prior to RED analysis, 
preferably using sizc-fractionation of genomic restriction fragments. In addition to 
1 0 identifying an enriched genomic size fraction for use in subsequent cloning and 
isolation procedures; the 2D-RED assay measures both the number and size of the 
expansions present in the genome. Analysis of CAG trinucleotide repeats using 2D- 
RED analysis has revealed that human genomic DNA samples contain two to four 
size fractions that typically generate a CAG RED30 product, a single size fraction that 
15 typically generates a RED40 product, and a small but variable number of fractions 
that generate larger RED products (Fig 7A). 

Another aspect of this invention relates to a method for isolating expanded 
trinucleotide repeats and corresponding flanking sequence from genomic DNA. This 
method of Repeat Analysis, Eoolcd Isolation, and detection of individual RED- 
20 positive clones (RAPID cloning) uses the RED assay to follow the expanded repeat 
through a series of pooled enrichment steps until a single RED-positive clone is 
obtained. This method can be divided into 1) fractionating a population of DNA 
fragments and detecting the fraction that contains an extended repeat; 2) cloning the 
DNA fragments contained in the fraction of DNA that contains an extended repeat; 
25 and 3) identifying clones that contain the extended repeat. The fractionation step can 
be further divided Into 1) digesting genomic DNA with a restriction enzyme to obtain 
DNA fragments; 2) resolving the DNA fragments by gel electrophoresis and dividing 
into fractions on the basts of size; and 3) detecting the presence of an extended 
trinucleotide repeat in each size fraction. Preferably, the RED assay is used to detect 
30 the presence of an extended trinucleotide repeat in each size fraction. 
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Using this technology, the expanded SC A7 allele from the genomic DNA of 
an. individual with ataxia and retinal degeneration can be purified and cloned. In 
addition, a DNA fragment containing a novel expanded repeat can be purified and 
cloned from the genomic DNA of an individual with clinical features similar to 
5 myotonic dystrophy. Isolating the SCA7 and Mnl CAG repeat expansions and the 
respective flanking DNA directly from the genomic DNA of single individuals 
illustrates the advantages of RAPID cloning for the identification of pathogenic repeat 
expansions for cases in which large pedigrees are unavailable, and demonstrates the 
dramatically improved efficiency with which putative trinucleotide disease genes can 
1 0 be isolated, characterized, and evaluated, 

In many respects a properly optimized method of performing RED analysis 
provides an ideal assay for tracking novel trinucleotide repeat expansions through 
purification and cloning procedures and has distinct advantages over other cloning 
assays that have been described (e.g., stringent hybridization and antibodies to 
15 polyglutamine tracts). The use of genomic DNA allows the isolation of any potential 
trinucleotide repeat expansion regardless ofthe expression pattern. Utilization of 
different oligonucleotides in the RED assay allows any ofthe possible trinucleotide 
repeats to be detected, and the cycled nature ofthe reaction makes it extremely 
sensitive. Most importantly, the ligation component of this assay allows RED to 
20 measure the approximate repeat length ofthe expansion present in the DNA template. 
The repeat length discrimination provided by RED is maintained as the concentration 
of target sequence in the template increases. This information provides a clear and 
robust means of distinctly identifying the expansion target throughout the purification 


process. 


1. Fractionation of DNA fragments and detecting tbc fraction that 

contains an extended repeat (2D-RED) 
RED analysis involves directly testing a sample of genomic DNA for the 
presence of an expanded repeat When RED analysis is performed on a genomic 
30 DNA sample that contains multiple trinucleotide expansions, the RED ladder 

generated from each expansion is superimposed on one another and the size ofthe 
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largest ligation product corresponds with the approximate size of the largest 
expansion. Thus, it is difficult to evaluate whether the genomic DNA contains more 
than one expanded repeat. In 2D-RED analysis the multiple expansions present in a 
genomic sample are physically separated from one another prior to RED analysis 
5 using size-fractionation of genomic restriction fragments. In addition to identifying an 
enriched genomic size fraction for use in subsequent cloning and isolation procedures, 
the 2D-RED assay measures both the number and size of the expansions present in the 
genome. Hie end result of the fractionating'step is the identification of a population 
of DNA fragments where at least one of the DNA fragments contains an expanded 

10 repeat region. 

Genomic DNA used in 2D-RED can be isolated from any individual 
Preferably, genomic DNA is isolated from a human individual affected with a disease 
that may be associated with the expansion of a repeat Any method known to the art 
for isolating genomic DNA can be used 

1 5 Genomic DNA is digested with a restriction enzyme to yield DNA fragments. 

Any restriction enzyme can be used. Preferably, the restriction enzyme will be 
Sau3A, Mbol or EcoRI, Methods detailing the use of restriction endomicleases to 
digest DNA can be found in Sambrook et al., Molecular Cloning: A Laboratory 
Manual; Cold Spring Harbor Laboratory: New York (1989), 

20 Digested DNA is resolved to divide the population of DNA fragments by size. 

The DNA fragments can be resolved by any method available. Preferably, digested 
DNA will be resolved by gel electrophoresis. More preferably by gel electrophoresis 
with low melting-point agarose, The resolved DNA is separated into discrete size 
fractions. For instance, if the DNA fragments were resolved by gel electrophoresis, 

25 the portion of the gel containing the DNA fragments can be excised with a razor 

blade. This gel segment can then be dissected into uniform slices using, for instance, a 
gel-siicing device. After separation of the DNA fragments into separate fractions, the 
DNA fragments in each fraction can be purified, 

To detect those fractions that contain at least one DNA fragment that contains 

30 an expanded repeat, any assay for detecting an expanded repeat can be used. 

Preferably, each fraction is subjected to RED analysis. The oligonucleotide used in 
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RED analysis is preferably 5* phosphorylated- Any oligonucleotide that is 
complementary to the repeat that is being analyzed can be used. For instance, to 
detect the presence of the CAG repeat, an oligonucleotide with multiple repeats of the 
sequence CTO would be used; Preferably, when the repeat is a trinucleotide repeat 
5 the total length of the oligonucleotide will be 30 nucleotides. RED analysis can be 
performed as described (Schalling, M., et al, Nature Genetics, 4. 135-139 (1993), 
Lindblad, K., et aL, Nature Genetics, 7, 124 (1994), Lindblad, K., et al., Genome 
Research & 965-971 (1996)). One aspect or this invention involve identifying 
expanded trinucleotide repeats from genomic DNA comprising performing RED 
10 analysis on a sample of genomic DNA wherein the rate of temperature change from 
the denaturation temperature is decreased from approximately 0,1 second per degree, 
and wherein the ligation buffer contains formaraide. Thus, the rate of temperature 
change from the denaturation temperature is less than approximately 0. 1 second per 
degree, preferably the rate of temperature change from the denaturation temperature is 
15 decreased to 2 seconds per degree, and the ligation buffer contains 4% forrnamidc. 

Hie products of the assay to detect the presence of an expanded repeat in each 
fraction are resolved This allows the products of the assay to be detected. For 
instance, if RED analysis was performed, the RED products arc resolved by gel 
electrophoresis. Detection of the products can be by any method. Preferably, the 
20 products are transferred to a membrane and detected by hybridization with a labeled 
DNA probe that is complementary to the sequence of the oligonucleotide used to 
detect the expanded repeat. For instance, when RED analysis is performed with the 
(CTG) I0 30-mer, the probe used to detect the product of RED analysis would be a 
, (CAG), a 30-mer, 

25 

2, Cloning Sequences Flanking a CAG Repeat 
Following identification of a DNA fraction containing at least one extended 
repeat, the DNA fragments present in the fraction are cloned. Cloning provides for 
the eventual isolation of a single DNA fragment that contains an extended repeat 
30 Moreover, cloning a single DNA fragment that contains an extended repeat allows the 
nucleotides flanking the extended repeat to be detennined. 
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The DNA fragments present in the DNA fraction containing at least one 
extended repeat is preferably inserted into a replicable vector for further cloning 
(amplification of the DNA). Many vectors arc available, and each replicable vector 
contains various structural components depending on the host cell with which it is 
5 compatible. These components arc described in detail below. 

Construction of suitable vectors employs standard ligation techniques known 
in the art. Isolated vectors and DNA fragments are cleaved, tailored, and religated in 
the form desired to generate the recombinant vectors required. Typically, the ligation 
mixtures are used to transform E. coli K12 strain 294 (ATCC 31,446) and successful 
1 0 transformants are selected by ampi cillin or tctracyc line resistance where appropriate, 
Plasmids from the transformants are prepared, analyzed by restriction endonuclease 
digestion* and/or sequenced by methods known in the art. See, e.g., Messing et al, 
Nucl Acids 309 (1981) and Maxam ct al. 5 Methods in Enzymology, 499 
(1980). 

1 5 Alternatively and preferably, the DNA fragments present in the DNA fraction 

containing at least one extended repeat are inserted into a lamba vector. The resulting 
recombinant vectors are used to transduce a suitable bacterial strain. 

Suitable host cells for cloning or expressing the vectors described here are 
prokaryotes including eubacteria, such as Gram-negative or Gram-positive organisms, 

20 for example, E. coli, Bacillus species such as B. subtilis, Pseudomonos species such as 
P. aeruginosa, Salmonella typhimurium, or Serraiia marcsecans. Preferred E, coli 
cloning hosts E. coli 294 (ATCC 3 1,446), & coli XLlJ&lue MRF, & coli SOLR, and 
E. coli CJ236, although other strains such as £ coli B, £ coli £1776 (ATCC 31,537), 
and E. coli W3110 (ATCC 27,325) are suitable. These examples are illustrative 

25 rather than limiting. 

Host cells are transformed and preferably transduced with the above-described 
cloning vectors for this aspect of the invention and cultured in conventional nutrient 
media modified as appropriate, selecting transductants, and amplifying the genes 
encoding the desired sequences. 

30 Transduction means introducing DNA into an organism so that the DNA is 

replicable, either as an extrachromosomal element or by chromosomal integrant. The 

~\ 
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. vector used in transduction is typically derived from a bacteriophage, preferably 
lambda. Following ligation of insert DNA fragments with the bacteriophage vector, 
the recombinant bacteriophage vector is packaged into the appropriate protein phage 
particle. The resulting population of phage particles is used to infect an appropriate 
5 bacterial strain. Depending on the host cell and phage used, transduction is done 
using standard techniques appropriate to such cells. 

After cloning the DNA fragments present in the fraction that contains at least 
one extended repeat, the individual clones that contain the extended repeat arc 
identified and isolated. One method to identify the individual clone involves 

1 0 screening the entire library of clones by plating host eel Is containing the library of 
clones on an appropriate media. The media optionally contains an appropriate 
selective agent^ preferably an antibiotic. The individual colonies or plaques are then 
assayed by methods well known in the art, including hybridization with a labeled 
DNA probe that is complementary to the sequence of the oligonucleotide used to 

15 detect the expanded repeat 

Alternatively and preferably, the library of clones will be used in several post- 
cloning enrichment steps. These steps are similar to procedures well known in the art 
(Ostrander et al., Proa Natl Acad. Set U.SA., 89i 3419-3423 (1992), Kunkel et at, 
(Methods in Enzymology> 154, 367-382 (19S7)). The result is the identification of 

20 individual clones, each of which contains a DNA fragment containing a trinucleotide 
repeat. 

After isolation of individual clones, the nucleotide sequence of the DNA 
flanking the CAG repeats can be determined by techniques well known to the art The 
sequence of the flanking DNA can be used to locate the position of the DNA on a 
2S physical map using techniques well known to the art. The sequence of the flanking 
DNA can also be used to clone the full length gene. 
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3. Cloning Full Length Genes Using Sequences That Flank an 
Extended Repeat 

The present invention relates to nucleic acid molecules containing an extended 
repeat, including nucleic acid molecules corresponding to entire genes containing an 

5 extended repeat and portions thereof. Preferably the extended repeat is the C AG 
repeat region of an isolated spinocerebellar ataxia type 7 gene, and nucleic acid 
molecules corresponding to the entire SCA7 gene and portions thereof The present 
invention further relates to vectors and isolated recombinant vectors comprising the 
entire SCA7 gene and portions thereof, including an isolated recombinant vector > 

10 comprising the nucleotides ofSEQ ID NO:l or SEQ ID NO:2 operatively linked to 
heterologous vector sequences* 

As used herein, the term '^isolated" means that the nucleic acid 
molecule, gene, or oligonucleotide is essentially free from the remainder of the human 
genome and associated cellular or other impurities. This does not mean that the 

1 5 product has to have been extracted from the human genome; rather, the product could 
be a synthetic or cloned product for example. As used herein, the term "nucleic acid 
molecule" means any single or double-stranded RNA or DNA molecule, such as 
mRNA, cDNA, and genomic DNA. 

Cloning of DNA into the appropriate replicable vectors provides for 

20 determining the sequences that flank an extended repeat and subsequent isolation of 
the full length gene. Cloning allows expression of the gene product and makes the 
gene available for further genetic engineering. Expression of the gene product or 
portions thereof is useful because these gene products can be used as antigens to 
produce antibodies, as described in more detail below, and in U.S. Patent Application, 

25 Serial No. 08/267,803, filed June 28, 1994. 

1. Isolation of DNA 

DNA containing a gene containing an expanded repeat may be obtained from 
any cDNA library prepared from tissue believed to possess the mRNA encoded by the 
30 gene and to express it at a detectable level. Optionally, the SCA7 gene may be 
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obtained from a genomic DNA library or by in vitro oligonucleotide synthesis from 
the complete nucleotide or amino acid sequence. 

Libraries are screened with appropriate probes designed to identify the gene of 
interest or the protein encoded by it. Preferably, the probes are derived from Ac 
5 nucleotide sequence flanking the extended repeat. Screening the cDNA or genomic 
library with the selected probe may be accomplished using standard procedures. 
Screening cDNA libraries using synthetic oligonucleotides as probes is a preferred 
method of practicing this invention. The oligonucleotide sequences selected as probes 
should be of sufficient length and sufficiently unambiguous to minimize false 

1 0 positives. When screening a library that contains DNA from different species, the 
actual nucleotide sequence^) of the probe{s) is usually designed based on regions of 
the nucleotides flanking the extended repeat that have the least codon redundancy. 
The oligonucleotides may be degenerate at one or more positions, i.e., two or more 
different nucleotides may be incorporated into an oligonucleotide at a given position, 

1 5 resulting in multiple synthetic oligonucleotides. The use of degenerate 

oligonucleotides is of particular importance where a library is screened from a species 
in which preferential codon usage is not known. 

The oligonucleotide can be labeled such that it can be detected upon 
hybridization to DNA in the library being screened. A preferred method of labeling is 

20 to use ATP and polynucleotide kinase 10 radiolabel the 5' end of the oligonucleotide. 
However, other methods may be used to label the oligonucleotide, including, but not 
limited to, biotinylation or enzyme labeling. 

Embodiments of nucleic acid molecules of this invention include isolated 
DNA fragments comprising bases 1-128 of SEQ 1DN0:1, bases 286-476 of SEQ ID 

25 NO: 1 , bases 1 - 1 28 of SEQ ID N 0: 1 and ftirther comprising a C AG repeat region and 
bases 1-128 of SEQ ID NO: I in a vector. Other embodiments of nucleic acid 
molecules of this invention include isolated DNA fragments comprising bases 922- 
1002 of SEQ IDNO:2, bases 1033-1864 of SEQ ID NO;2» bases 922-1 002 of SEQ ID 
NO;2 and further comprising a CAG repeat region, and bases 922-1002 of SEQ ID 

30 NO:2 in a vector. 
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Of particular interest is the SCA7 nucleic acid that encodes a full-length 
mRNA transcript, including the complete coding region for the gene product, ataxin- 
7. Nucleic acid containing the complete coding region can be obtained by screening 
selected cDNA libraries using the deduced amino acid sequence. 
5 An alternative means to isolate the gene containing an expanded repeat is to 

use PCR methodology. This method requires the use of oligonucleotide primer 
probes that will hybridize to the SCA7 gene. Strategies for selection of PCR primer 

oligonucleotides are described below. 

10 2, Insertion of DNA into Vector 

The nucleic acid (e.g., cDNA or genomic DNA) containing the gene 

containing an expanded repeat is preferably inserted into a replicable vector for 

further cloning (amplification of the DNA) or for expression of the gene product. 

Many vectors are available, and selection of the appropriate vector will depend on: 1) 
1 5 whether it is to be used for DNA amplification or for DNA expression; 2) the size of 

the nucleic acid to be inserted into the vector; and 3) the host cell to be transformed 

with the vector. 

Construction of suitable vectors employs standard ligation techniques known 
in the art. Isolated plasmids or DNA fragments are cleaved, tailored, and rellgated in 

20 the form desired to generate the plasmids required. Typically, the ligation mixtures 
are used to transform E coli K12 strain 294 (ATCC 3 1 ,446) and successful 
transformants are selected by ampiciilin or tetracycline resistance where appropriate. 
Plasmids from the transformants are prepared, analyzed by restriction endonuclease 
digestion, and/or sequenced by methods known in the art. See, e.g., Messing et ah, 

25 Nucl Acids Res.> 2* 309 (19&1) and Maxam et ah, Methods in Enzymology, 65, 499 

(1980). 

Replicable cloning and expression vector components generally include, but 
are not limited to, one or more of the following: a signal sequence, an origin of 
replication, one or more marker genes, an enhancer element, a promoter and a 
30 transcription termination sequence. At this time a large number of each of these 

components that arc recognized by a variety of potential host cells are well known to 
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the art. It is also well known to the art that a component can be removed from its 
source DNA using standard molecular biology techniques and used in conjunction 
with other components that arc endogenous to a particular species, e.g., profcaryotes B 
filamentous fungi, yeast, protozoa, and vertebrate, invertebrate and plant cell culture. 
5 Alternatively, heterologous components can be used together to result in the stable 
replication of a cloned DNA, or the expression of a protein encoded by a cloned 
DNA. A non-limiting description of components that can be used in cloning genes 
containing expanded trinucleotide repeals can be found in U.S. Patent Application, 
Serial No. 08/267,803, filed June 28, 1 994. 
10 ' 

3. Host Cells 

Suitable host cells for cloning or expressing the vectors herein are prokaryotes, 
filamentous fungi, yeast, protozoa, and higher eukaryorJc cells including vertebrate, 
invertebrate and plant cells. Preferably the host cell should secrete minimal amounts 
15 of proteolytic enzymes. 

Suitable host cells for the expression of a glycosylated protein encoded by a 
gene containing an expanded repeat are derived from multicellular organisms. Such 
host cells are capable of complex processing and glycosylate activities. Id principle, 
any higher eukaryotic cell culture is workable, whether from vertebrate or invertebrate 
20 culture. ( 

Propagation of vectors containing cloned DNA in host cells has become a 
routine procedure in recent years and is well known to the art. 

Alternatively, in vitro methods of cloning, e.g., PCR or other nucleic acid 
polymerase reactions, are suitable. 

25 

4. Transfectiort and transformation 

Host cells are cransfected and preferably transformed with the above-described 
expression or cloning vectors of this invention and cultured in conventional nutrient 
media modified as appropriate for inducing promotets > selecting transformants, or 
3 0 amplifying the genes encoding the desired sequences. 
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Numerous methods of treating a host cell to promote the uptake of a vector 
containing cloned DNA are known to the ail including, for example calcium 
phosphate precipitation,, electroporation, calcium chloride treatment, nuclear, injection, 
protoplast fusion or microprojectile bombardment may also be used- 

The culture of host cells containing the cloning vector in suitable media so as 
to promote viability of the host cells and carriage of the cloning vector is well known 
to the art. Any necessary supplements may also be included at appropriate 
concentrations that would be known to those skilled in the art. The culture conditions, 
such as temperature, pH, and the like will be apparent to the ordinarily skilled artisan. 
The host cells referred to in this disclosure encompass in vitro culture as well as cells 
that are within a host animal. 

C. Protein \ 

A gene containing an extended repeat encodes a protein that can be purified 
from a host cell expressing the recombinant gene. For instance, the SCA7 gene 
encodes a novel protein, ataxin-7, a representative example of which is shown in SEQ 
IDNO:12, and a portion of which is shown in SEQ ID NO: 1 1. Thus, the present 
invention is related to polypeptides comprising amino acids encoded by the nucleic 
acids of genes containing an extended trinucleotide repeat, preferably the nucleic acid 
20 of SEQ ID NO:l or SEQ ED NO:2. The present invention is also related to an isolated 
recombinant vector and an Isolated nucleic acid fragment that is capable of expressing 
a polypeptide comprising arnino acids of a protein that contains a polyglutaraine 
region. Preferably, the amino acids comprise amino acids 1-27 of SEQ ID NO; 1 1 or 
SEQ ID NO:12, The invention is further related to a protein encoded by the SCA7 
25 gene, wherein the protein contains between 5 and 1 1 0 CAG repeats and has a 
molecular weight of approxiamtely 95-108 kD> 

It is to be understood that ataxin-7 represents a set of proteins produced from 
the SCA7 gene with its unstable CAG region. Ataxin-7 can be produced from cell 
cultures. With the aid of recombinant DNA techniques, synthetic DNA and cDNA 
30 coding for ataxin-7 can be introduced into microorganisms which can then be made to 


10 


15 
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produce the peptide. It is also possible to manufacture ataxin-7 synthetically, in a 
manner such as is known for peptide syntheses. 

Ataxin-7 is preferably recovered from the culture medium as a cytosolic 
polypeptide, although it can also be recovered as a secreted polypeptide when 
5 expressed with a secretory signal . 

Ataxin-7 can be purified from recombinant cell proteins or polypeptides to 
obtain preparations that are substantially homogenous as ataxin-7 using techniques 
well known to the art. The following procedures are exemplary of suitable 
purification procedures: fractionation on immunoaffinity or ion-exchange columns; 
10 ethanol precipitation; reverse phase HPLC; chromatography on silica or on a cation- 
exchange resin such as DEAE; chromatofocusing; SDS-PAGE; ammonium sulfete 
precipitation; gel filtration using, for example, Sephadex G-75; ligand affinity 
chromatography, using, e.g., protein A Sepharose columns to remove contaminants 
such as IgG, 

15 Ataxin-7 variants in which residues have been deleted, inserted, or substituted 

are recovered in the same fashion as native ataxin-7, taking account of any substantial 
changes in properties occasioned by the variation. A protease inhibitor such as phenyl 
methyl sulfonyl fluoride (PMSF) also may be useful to inhibit proteolytic degradation 
during purification* and antibiotics may be included to prevent the growth of 

20 adventitious contaminants. 

Covalent modifications of both native ataxin-7 and amino acid sequence 
variants of the ataxin-7 may be covalently modified. Covalent modifications of 
ataxin-7 or fragments thereof can be introduced into the molecule by reacting targeted 
amino acid residues of the ataxin-7 or fragments thereof with a derivatizing agent 

25 capable of reacting with selected side chains or the N- or C-terminal residues, 
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fis_ Antib dies 

The present invention also relates to polyclonal or monoclonal antibodii 
raised against a fall length protein encodicd by a gene containing an extended repeat, 
or a fragment thereof (preferably fragments having 8-40 amino acids, more preferably 
5 10-20 amino acids, that form the surface of the folded protein), or variants thereof, 
and to diagnostic methods based on the use of such antibodies, including but not 
limited to Western blotting and ELISA (enzyme-linked immunosorbant assay). For 
instance, the present invention relates to polyclonal or monoclonal antibodies raised 
against ataxin-7 or ataxin-7 fragments, arid to diagnostic methods based on the use of 
10 such antibodies. 

Polyclonal antibodies to the SCA7 polypeptide generally are raised in animals 
by multiple subcutaneous ($c) or intraperitoneal (ip) injections of ataxin-7, ataxin-7 
fragments, or variants thereof, and an adjuvant using techniques well known to the art. 
The route and schedule of immunizing a host animal or removing and culturing 
1 5 antibody-producing cells are variable and are generally in keeping wittj established 
and conventional techniques for antibody stimulation and production. Serum 
antibodies (IgO) are purified via protein purification protocols that are well known in 
the art 

r Monoclonal antibodies are prepared by recovering immune cells - typically 
20 spleen cells or lymphocytes from lymph node tissue - from irrununized animals 
(usually mice) and immortalizing the cells in conventional fashion, t.g. 9 by fusion 
with myeloma cells. The hybridoma technique described originally by Kohler et al,, 
Eur, J, Immunol., 6, 511 (1976) has been widely applied to produce hybrid cell lines 
that secrete high levels of monoclonal antibodies against many specific antigens. The 
25 production and purification of monoclonal antibodies is well known to the art. 

The anti-ataxin-7 antibody preparations of the present invention are specific to 
ataxin-7 and do not react immunochemically with other substances in a manner that 
would interfere with a given use. For example, they can be used to screen for the 
presence of ataxin-7 in tissue extracts to determine tissue-specific expression levels of 
30 ataxin-7. 


r 
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The present invention also encompasses an immunochemical assay that 
involves subjecting antibodies directed against ataxin-7 to reaction with the ataxin-7 
present in a sample to thus form an (ataxin-7/anti-ataxin-7) immune complex, the 
formation and amount of which are measures - qualitative and quantitative, 
5 respectively - of the ataxin-7 presence in the sample. The addition of other reagents 
capable of biospecifically reacting with constituents of the protein/antibody complex, 
such as anti-antibodies provided with analytically detectable groups, facilitates 
detection and quantification of ataxin-7 in biological samples, and is especially useful 
for quantitating the level of ataxin-7 in biological samples, ataxin-7/anti-ataxin-7 

1 0 complexes can also be subjected to amino acid sequencing using methods well known 
in the art to determine the length of a polyglutamine region and thereby provide 
information about likelihood of affliction with spinocerebellar ataxia and likely age of 
onset. Competitive inhibition and non-competitive methods, precipitation methods, 
heterogeneous and homogeneous methods, various methods named according to the 

1 5 analytically detectable group employed, Immunoelectrophoresis, particle 

agglutination, immunodiffusion and immunohistochemical methods employing 
labeled antibodies may all be used in connection with the immune assay described 
above. 

20 The invention has been described with reference to various specific and 

preferred embodiments and will be further described by reference to the following 
detailed examples. It is understood, however, that there are many extensions, 
variations, and modifications on the basic theme of the present invention beyond that 
shown in the examples and detailed description, which are within the spirit and scope 

25 of the present invention. 


EXPERIMENTAL SECTION 
RAPID cloning was used to identify, characterize and evaluate two novel 
CAG expansions. The first CAG expansion was isolated from the genomic DNA of 
30 an individual with a genetically distinct form of myotonic dystrophy. The second 

CAG expansion was isolated from the genomic DNA of an individual with ataxia and 


CA 02245310 1998-08-19 


08/18/9$ TUE 13:52 FAI 1 612 305 122S MUETING & RAASCH @033 


retinopathy. These results demonstrate the advantages of RAPID cloning for the 
identification of pathogenic repeat expansions for cases in which large pedigrees are 
unavailable, and illustrates the dramatically improved efficiency with which putative 
trinucleotide disease genes can be isolated, characterized and evaluated. 
5 A. Methods 

1. Clinical resources. 
The ataxia patients were collected and examined as described (Ran urn, L.P + W. 
et al., Am. 1 Hum. Genet, 57, 603-608 (1995)), For the family with a genetically 
distinct foim of myotonic dystrophy, neurological exams and EMG's were performed 

1 0 with the presence o f myotonic discharges used as the diagnostic criteria for the 
classification of MN1 family members as affected. After informed consent was 
obtained, blood was collected in two acid citrate dextrose (A CD) tubes (Vacutainer 
#4606) and one sodium heparin tube (Vacutainer #6480). DNA was isolated from 
blood in the ACD tubes (Puregene #D-5003, Gentra Systems, Research Triangle 

1 5 Park, North Carolina) and EBV transformed lymphoblastic cell lines (LCLs) were 
established using blood from the heparin tubes. 

DNA samples from the grandparents of the panel of 40 Centre d'Etude du 
Pfoymorphisme Humain (CEPH) reference families were used as normal controls for 
the SCA7 PCR assay (Marx, J. Science, 22g, 150-151 (1985)), 

20 2. Optimized RED conditions. 

RED analysis was performed as described by Schailing ct al, and Lindblad ct 
al., (Schailing, M., et ah, Nature Genetics, i, '135-139 (1993), Lindblad, K. T et al., 
Nature Genetics, 7, 1 24 (1 994), and Lindblad, K. et al., Genome Research, 6, 965-971 
(1996) with the following modifications: 2 mg of genomic DNA were used; 4% 

25 formamide was added to the reaction, the reaction conditions were denaturation at 94 
°C (4 min,50 sec) followed by 495 cycles of 94°C for 10 seconds, and 70 5 C to 78°C 
for 40 seconds; a 2 sec/degree ramp was used when the reaction temperature dropped 
from 94°C. The oligonucleotide used in the RED reaction is a PAGE purified, 5' 
phosphorylated, (CTG)io 30-mer synthesized by National Biosciences Inc., 

30 (Plymouth, MK). All reactions were performed using an Omnigene Hybaid thermal 
cycler with a heated lid. RED products were separated by size on a denaturing 6% 
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polyacrylamidc gel containing 6M urea and transferred to Hybond N+ (Amersham) as 
described (Schalling, M., et al., Nature Genetics 4, 135-1 39 (1993), Lindblad, K., et 
aL, Nature Genetics, % 124 (1994), Lindblad, K. et al., Genome Research, fi, 965-971 
(1 996)), and hybridized with a 3* labeled (CAO)|Q 30 mer, 
5 3, Two dimensional RED (2D-RED) analysis of genomic DNA. 

Ten mg of genomic DNA was digested with a restriction enzyme and size- 
separated on a 1,5% SeaPIaque GTG (FMC, Rockland, ME) low melting-point 
agarose gel in Ix TAE buffer. The DNA was visualized after ethidium bromide 
staining on a UV transiUuniinator and the portion of the gel containing DNA was 

1 0 excised with a razor blade. This gel segment was then dissected into uniform 2mm 

slices using a gel-slicing device in which microscope coverslips arc used as disposable 
dissecting blades. These slices were placed in separate 0.5 m! PCR tubes, heated to 
72°C for 10 min to melt, and then equilibrated at 42 °C. One ul AgarACE enzyme 
(0.2 U, Promega, Madison, WI) was added and the samples were incubated for 3 hour 

15 to completely digest the agarose. The size separated DNA was concentrated by EtOH 
precipitation, dried, and resuspended in 7 ul of 1 0 mM Tris, 1 mM EDTA (pH 7.5) 
buffer. RED analysis was performed on 3.5 ul of DNA from each fraction to 
determine which size fraction was most highly enriched for the RED positive genomic 
fragments. 

20 4, Cloning of genomic fragments. 

EcoRl digested genomic DNA recovered from the RED positive gel fraction 
was cloned using the predigested Lamdba ZapU/EcoRX-'ClAP cloning and packaging 
kit (Stratagenc, La Joila, CA). After the initial library was generated and titered, eight 
plates containing 5x1 0 4 primary clones/plate were amplified separately as described 

25 in the protocols provided by the manufacturer. ssDNA derivatives of these libraries 
were then generated by coinfecting & colt strain XL 1 -Blue MRF with both the 
amplified lambda library and the ExAssist Ml 3 helper phage. The SK" Bluescript 
phagemids were used to infect the SOLR strain of E. coli (Stratagene, La Joila, CA) 
from which double stranded plasmid (pBluescript) DNA was purified using the 

30 Wizard M 1 3 DNA Purification System (Promega). The plasmid DNA representing 
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each clone pool was then assayed by RED analysis for the presence of expanded 
repeats. 

5. Post-cloning enrichment 

The enrichment of CAG containing clones is an adaptation of the general 
5 approach described by Ostrander et al., (Proc. Natl Acad. Sci US. A., 89, 3419-3423 
(1992)) which is based on the selection schemes established by Kunkel et ah 9 
{Methods in Enzymology, 154, 367-382 (1 987)). Plasnrid DNA (-0,1 mg) 
representing a RED positive pool of clones was electroporated into E. coli strain 
CJ236 (dut-, ung- y BioRad, Hercules, CA) to generate uracil-substituted DNA, After 

10 a 1 hour recovery without antibiotic, the cells were inoculated into 200 ml of LB 
containing 40mgfml ampicillin and incubated at 37°C with shaking for 3 nr. To 
convert the plasmid dsDNA to ssDNA, M13K07 helper phage (lxl0 l0 pfu) was 
added, and the culture was incubated at 37°C w/ shaking for an additional 2 hrs. 
Kanamycin was then added to t%» the culture was incubated for 18 hours and single 

1 5 stranded DNA was purified essentially as described (Vieira, J. r et al., Methods in 
Enzymology, 151, 3-1 1 (1987)), but with the addition of a DNasei treatment (2000 u, 
1 hr. 37°C) prior to phage precipitation. The purified ssDNA was incubated with T4 
DNA polymerase (New England Biolabs, Beverl, MA) overnight without dNTP to 
eliminate contaminating DNA that could act as primers. The CTG repeat containing 

20 ssDNA was then converted to dsDNA by primer extension using ampliTaq (Perkin 
Elmer) and a (CAG)i q primer at 72°C in a buffer containing 4% fonnamidc. One ul 
of uracil DNA glycosylase (UDG> GIBCO-BRL, Gaithcrsburg, MD) was added after 
extension to degrade the remaining ssDNA. After extraction (lxpbenol:CHCl3 and 
IXCHCI3) and EtOH precipitation, the DNA was electroporated into the SURE strain 

25 oiKcoli (Stratagene, La Jolla, CA). 

6. Sequence analysts. 

Genomic sequences from the ends of the SCA7 EcoRJ fragment were used in 
a nucleotide BLAST search of the GenBank database and revealed an overlap with 
three related EST sequences (accession numbers H40285 f H40290, H41756). A 
30 search of the human EST map (http^/www.ncbi.nlm.nih .gov/SCIENCE96/) (Schukr, 
G. et al. s Science, 274, 540-546 (1996)) revealed that these ESTs have been mapped to 
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the same region of chromosome 3p to which the SCA7 mutation has been mapped 
(Gouw, L.G. ct ah, Nature Genetics, lfl, 89-93 (1995), Benomar, A. et al., Nature 
Genetics, 10, 84-88 (1995), and David, a ct at. American Journal of Hitman 
Genetics, 59, 1328*1336 (1996)). A BLAST search with the nucleotide sequence 
5 flanking the SCA7 C AG repeat revealed that an unexpanded allele with ten CAG 
repeats was present in the database and reported as a chromosome 3 Not! jumping 
clone (accession number X9583 1) . The sequence flanking the accession number for 
the MN1 CAG did not match any present in the database, 

7* PCR assays of expanded trinucleotide repeats. 

1 0 The SCA7 repeat expansion assay was done using the SCA7-F1 (5 f 

TTITiTGTTACATTGTAGGAGCG) (SEQ ID NO.;5) and SCA7-RI 
(5' CACTTCAGGACTGGGCAGAG) (SEQ ID NO:6) primers in a PCR reaction (50 
ng genomic DNA, 200 mM dNTP, 10 mM Tris pH 9,0, 50 mM KC1, 0,1% Triton X- 
100, 1.5 mMMgCI 2 , 10% DMSO, 0.1 units AnipliTaq) cycled 35x (94 °C 50 s, 55 

15 ^Imm^s^oc Imin). 

The MN1 CAG repeat assay was performed using primers MN4F (5' 
GCCAGATGAGTTTGGTGTAAGAT) (SEQ ID NO:7) and MN4R (5' 
AAGCCATTTCTCCAAAAGAAGGTC) (SEQ ID NO:8) were used in a PCR 
reaction (20 ng genomic DNA, 200 mM dNTP, 10 mM Tris pH 9.0, 50 mM KC1, 
20 0.1%TritonX-100 t 0.01%gelatin > 1.5 mMMgCl2, 10% DMSO, 0.1 unit AmpliTaq) 
cycled 35x (94 <>C 45 s, 52 °C Imin, 72 oc lmin). 

B. Results 

1. Optimization of the RED assay 
25 To determine whether (CTG)| o RED analysis worked well, the available 

protocols (Schailing, M., et al., Nature Genetics i, 135-139 (1993), Lindblad, K., et 
al„ Nature Genetics 7, 124 (1994), Lindblad, K. et al M Genome Research 6, 965-971 
(1996)). were tested with defined genomic DNA control templates. RED results with 
these templates were inconsistent and typically did not correlate with size of the 
30 largest known CAG expansion in the genomic sample. It was found that decreasing 
the rale of temperature change during annealing and including 4% formamide in the 
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ligation buffer increased the sensitivity and reproducibility of this assay for detecting 
uninterrupted CAG expansion genomic controls. RED analysis of genomic DNA from 
individuals with CAG expansions of known size at the SCA7, SCA3, HD> and DM 
disease loci using this optimized protocol is shown in Fig. 2a, For each of the positive 
5 genomic control samples shown, the size of the largest RED product corresponds well 
with the size of the known CAG expansion as measured by PCR assays. RED analysis 
of the SCA3, HD, and DM genomic samples using a ligation buffer that did not 
contain formamide is shown in Fig. 2b. The results obtained from this type of reaction 
differ dramatically from those shown for the same samples in Fig. 2a. We have found 
1 0 that multiple other alterations to the optimized reaction conditions, including the use 
of different thermocyclere and poor DNA quality, can have similar deleterious affects 
on the RED results we obtain. 

2. Two-dimensional RED analysis 

In a second part of this method, a two dimensional repeat expansion detection 
15 assay (2D-RED) was developed that uses the physical size of RED positive genomic 
DNA digestion fragments to uniquely identify multiple expanded alleles in a genomic 
DNA sample (sec Fig. 3). In the 2D-RED protocol, which is outlined schematically in 
Fig. 3a, genomic DNA was digested with a restriction enzyme, run on an agarose gel, 
and separated into discrete size fractions. The agarose from each fraction was 
20 enzymatically removed, the DNA was concentrated by precipitation, and RED 

analysis was performed on the fractions. Agarose gel analysis of a portion of the size 
fractions generated from Mbol-digested genomic DNA from an individual with 
SCA3/ MJD is shown in Fig 3b. The corresponding RED analysis of these fractions 
is shown in Fig. 3 c. In this example, size fractions consisting of DNA fragments of 
25 approximately 500 bp in length generate the RED70 product expected for the 
expanded SCA3 allele present in the original genomic sample. 

3. Cloning and assessment of a novel expanded CAG repeat 

A family (MN1) has been identified that has clinical features strikingly similar 
to myotonic dystrophy but whose disease locus does not involve the chromosome 19 
30 CTG repeat expansion and is not genetically linked to the DM region of chromosome 
19. Fig. 4 shows the results of RED analysis performed on genomic DNA from a 
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10 


control group of eight spouses and one affected member of the MN 1 family. The 
genomic DNA sample from the affected individual generated a RED product that was 
substantially larger £1 10 CAG repeats) than any of the products generated by the 
spouses DNA samples. RED analysis of genomic DNA from additional femily 
members indicated that a CTG-expansion of a similar size was present in at least five 
other affected individuals and was notdetectable in any samples from unaffected 
family members. 2D-RED analysis of two of the RED positive samples showed that 
the expansion was present on a Mbol fragment of approximately the same size in both 
genomes, These data suggested the intriguing possibility that a CTO repeat expansion 
at a locus distinct from DM causes the MN1 form of myotonic dystrophy. To directly 
assess whether or not the CTG expansion is involved in the disease, we cloned the 
genomic fragment coitaimng this expansion using the RAPID procedure, sequenced 
the DNA flanking the repeat and performed PCR analysis of the repeat on the 
extended kindred. 

1 5 Genomic DNA from an affected member of the MN1 kindred was digested ' 

wilh EcoO. and 2D-RED was performed to identify the RED-positive sue fraction. 
The DNA fragments from this faction (approximately 2-kb in size) were then cloned 
into lamdba ZAP1I cloning vector (Stratagene). Eight pools consisting of 
approximately 5x1 0* primary clones each were amplified and mass excision was 

20 performed on each pool separately to convert the lambda clones into plasmids (see 
Methods). RED analysis was performed on the isolated plasmid DNA from these 
eight pools to determine which contained the cloned repeat (Fig 5A). Four of the eight 
pools were RED positive, and the DNA from two of these (pools 3 and 8) were 
selectively enriched forC*G<oiUffliung clones (Fig. 5B, sec Methods). The DNA 

25 from small pools of individuaUy inoculated clones (20 clones each) from this enriched 
library were assayed by RED (Fig. 5C). Individual repeat-containing clones were 
identified by RED analysis of DNA prepared from row and column pools inoculated 
from RED positive plates. The genomic inserts from four of these clones were 
sequenced. All four contained genomic inserts with an identical nucleotide sequences 

30 that varied only in the size of the CAG repeat. 
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The genomic sequence flanking the CAG expansion isolated from the MN1 
family is shown in Fig. 5D. The size of the CAG repeat (underlined) varied in the 
clones obtained up to a maximum length of 1 1 0 uninterrupted repeats, a length that 
corresponds weli with the RED results obtained from the genomic DNA of the 
5 individual from whom it was isolated. Once an isolated clone was obtained, however, 
the insert appeared to be stable. In each case, the size of the repeat as determined by 
sequencing corresponded with the approximate size indicated by RED (see Fig. 5C). 

PCR primers were designed from this sequence to amplify across the CAG 
repeat and PCR analysis of the MN I kindred was performed (see Methods). This 

10 analysis, the results of which are summarized in Fig. 6, showed that the unstable 
expansion does not cosegregate with the disease. The unexpanded allele was highly 
polymorphic and had between 1 1 and 24 uninterrupted CAG repeats (as confirmed by 
sequence analysis of several alleles). The expanded allele varied in size between 
individuals and had up to approximately 130 repeats in the individual with the largest 

1 5 expansion. Each of the family members with an expansion at this allele were detected 
in the original genomic RED analysis, with the exception of the two deceased 
individuals for whom only relatively poor-quality DNA samples were available. 
^ 4, Cloning of the expanded SCA7 CAG repeat 

This invention also relates to the isolation of nucleic acid containing a CAG 

20 repeat region, hereinafter referred to as SC A7, During the past four years we have 
collected blood samples from affected individuals representing 355 different families 
with dominant, recessive or sporadic forms of adult onset ataxia (Ranum, L.P.W, et 
aL./4m. J. Hum. GenetJ £2, 603-60S (1995)). To identify individuals whose ataxia 
was likely to be caused by a novel trinucleotide repeat expansion, we selected patients 

25 that were negative for the known dominant ataxia gene expansions (SCA7, 2, 3, and 
6), and who had a dominant family history with anticipation. The proband of pedigree 
A (Fig 8) fit the above criteria. In addition to having a severe form of 
olivopontocerebellar degeneration that required nursing home care at 32 yrs of age, 
the patient also had the retinal degeneration characteristic of patients with SCA7. 

30 Genomic DNA from the proband of kindred A (Fig 8) was digested with 

EcoRI and 2D-RED was performed to identify the RED-positive size fractions (Fig 
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7A). The DNA fragments from the fraction that generated a RED60 product were then 
cloned into a lambda vector. Ten pools consisting of approximately 5x10^ primary 
lambda clones each were amplified in pools and mass excision was performed on each 
pool separately to convert the lambda clones into plasmids and RED analysis was 
5 performed on the isolated plasmid DNA from these pools. The DNA from a RED 
positive clone pool was selectively enriched for CAG-containing clones and DNA 
from small pools of clones (36 clones in each pool) from this enriched library were 
assayed by RED {Fig. 7B). RED analysis of DNA prepared from clones individually 
inoculated from a RED positive plate identified two clones that contained the 

10 expanded CAG repeat and the genomic inserts from these clones were sequenced. 
Sequence analysis (see Methods) revealed that one end of the genomic EcoRI 
fragment overlaps with a set of ESTs that have been mapped to the SCA7 region of 
chromosome 3p, which in turn physically mapped the CAG expansion on this 
genomic fragment to the same region. 

1 5 The genomic sequence flanking the CAG expansion isolated from the patient 

with ataxia and retinal degeneration is shown in Fig, 7C. PCR primers were designed 
from this sequence to amplify across the CAG repeat and PCR analysis was 
performed on samples from five kindreds in our ataxia family collection that have 
been diagnosed with retinopathy as well as on a large panel of unaffected control 

20 genomic DNA control templates. This analysis, the results of which are summari zed 
in Fig. 8, showed that the CAG repeat sequence is expanded in affected and one at- 
risk individual (37 to 68 repeats) in these ataxia kindreds but not in any of the 
unaffected controls. As is the case with other ataxia CAG mutations, the expanded 
allele is unstable and the age of onset and the repeat site are inversely correlated. 

25 Marked anticipation is observed for both male and female transmissions. 

Patents, patent applications and publications disclosed herein are hereby 
incorporated by reference as if individually incorporated It is to be understood that 
the above description is intended to be illustrative, and not restrictive. Various 
30 modifications and alterations of this invention will become apparent to those skilled 
in the art from the foregoing description without departing from the scope and the 
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spirit of this invention, and it should be understood that this invention is not to be 
unduly limited to the illustrative embodiments set forth herein. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: REAGENTS OF THE UNIVERSITY OF MINNESOTA 
(ii) TITLE OF INVENTION: SCA7 GENE AND METHODS OF USE 
(iii) NUMBER OF SEQUENCES: 14 
(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SMART & BIGGAR 

(B) STREET: P.O. BOX 2999, STATION D 

(C) CITY: OTTAWA 

(D) STATE: ONT 

(E) COUNTRY: CANADA 

(F) ZIP: KIP 5Y6 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: ASCII (text) 
(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: CA 2,245,310 

(B) FILING DATE : 19-AUG-1998 

(C) CLASSIFICATION: 
(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/056,170 

(B) FILING DATE: 19-AUG-1997 
(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: SMART & BIGGAR 

(B) REGISTRATION NUMBER: 

(C) REFERENCE/DOCKET NUMBER: 76433-8 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (613) -232 -2486 

(B) TELEFAX: (613) -232-8440 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 477 base pairs 

(B) TYPB: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
CGACTCTTTC CCCCTTTTTT TTGTTACATT GTAGGAGCGG AAAGAATGTC GGAGCGGGCC 60 
GCGGATGACG TCAGGGGGGA GCCGCGCCGC GCGGCGGCGG CGGCGGGCGG AGCAGCGGCC 120 
GCCCGGCAGC AGCAGCAGCA GCAGCAGCAG CAGCAGCAGC AGCAGCAGCA GCAGCAGCAG 180 
CAGCAGCAGC AGCAGCAGCA GCAGCAGCAG CAGCAGCAGC AGCAGCAGCA GCAGCAGCAG 240 
CAGCAGCAGC AGCAGCAGCA GCAGCAGCAG CAGCAGCAGC AGCAGCCGCC GCCTCCGCAG 300 
CCCCAGCGGC AGCAGCACCC GCCACCGCCG CCACGGCGCA CACGGCCGGA GGACGGCGGG 360 
CCCGGCGCCG CCTCCACCTC GGCCGCCGCA ATGGCGACGG TCGGGGAGCG CAGGCCTCTG 420 
CCCAGTCCTG AAGTGATGCT GGGACAGTCG TGGAATCTGT GGGTTGAGGC TTCCAAA 477 


(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 1864 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE : cDNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
GAATTCAAGT TCCCCAGGCG CGCACAGTGT GACTTCCAAT TGCGGGTCGG GGGCACACAT 60 
GTGGGCGGCG TGGCGTGCAC ACGGTACTTC GTCCTGACAC CTGGGCAGAC ACCTGGGAGG 120 

40 

CACTTCCCCG CTAGCCCAAG GTTCCCTGCA CGCCCGGAGT CCGCTCTGCG GCGGCTTCCC 180 
ATTCATGCTT TTGACAACTC TGCGGCCGCC CGCAAGCCGA GGGCAAAGTG CCCCCTGCAC 24 0 

CAGCCTCTCC CGCGCTGCCC TGGGGCCGGC CGGCCGGCTC CTCCATAGGT GGCTGCATTT 300 
CCGACTTCGC CCTGGCTCCA GTCCGGGGGC TTGACGTGCA AACTTCGCCG GAGCGAGCTG 360 
AGAGGGAGGG ACCGGCAAGT GGGAGGAGGC GGCGGGAGGC GTCTCCGCTT AAGGGAGCCG 42 0 

50 


76433-8 
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30 


40 


GCCGGGTGCC 

GCCGCTCGGA 

CGGACGCGCG 

^pa ^ a ^p" ^pr ^pr ^pp* 

GACGGAAGGA 

AGGAGCGGGG 

CAGCCGGGCC 

** O \J 

GGGCCCGGGG 

^p* ^pp ^p* ^p^ ^pp 

ATGCAAGCGG 

CCGAAGGGTG 

GGCAGCTGGA 

^pr ^p* W« a^rf A Vpr ^^p>*> 

GGTCCTGGGG 

%pT ^pl pft) ^P» %P» pk XpT%p1%^^J 

TGCGGCTCGG 

osu 

GCTTCCCCGC 

GCGGGCTGCC 

ATGGTGGGGC 

GCGGGGTTGG 

AGCCGGGCCG 

* Pi^pT^pi W XpT^PJ ^PT ^P> ^P^^bV 

CTCCGGCGCT 

cnn 
o u y 

GGCCTCCGCG 

^p> A- ^Pjp# %pP ^fcJP> ^p) 

CCAGGTCCTC 

^■P ^B» * B> vpP **p* ^» ^ip> ltpr Bh ^pp 

TGAGCAGAAG 

CAGGCAGGGG 

ACCCAGCGCC 

GCGGTGnr*Gr; 

o o u 

GCCGCCTGCT 

^pP ^P* ^P" ^pP ^P* ^P» ^P> ^PJP* ^PJP PS) 

GCCCGTCCCC 

^P* 1 ^pP ^PP 1 ^PP" ■» ^P" ^P> ^pP ^pP 

TCCCTCGGGC 

* ^p» ^p» ^pp m Np* 

GGCCGCGGGA 

GTC GAAAG CG 

AAAfiPTARPr 

Ton 

CGCGCCGCGA 

^*P" ^*P" ^pr ^pf ^pF ^pjP ^pp» ^PJ* 4 

CTTGAGCCCG 

GGGCGGGGGT 

GGCCTTGAGG 

AGGCGGGCTP 

•ri^7wW\pTXpTV7\p* J. 


/ 0 u 

GCGGCCATGG 

^P* ^*P* ^tapf ^pi ^P»Bb 4 ^k^^pv 

GGGCGCTGTC 

^pl ^fcp> ^p> ^pjbt ^^p bw 

AGCGTGCCCC 

ACCCGGTCCG 

CGGGCCGCGC 


q a n 

O *4 U 

AACTCCCTGG 

■* ■* P»^pP ^* ^PP 1 ^P" ^*P" 1M» ^*P 

\ 

CGCCTCCTTA 

^■p ^pp ^p* #■» ^p* ^p* *•> * p> 

AAAAACGGCC 

CCCGCGCGAC 

TCTTTCCCCC 

pk ^p- «V «1> 1 vv vvv 

r PT ,r r r r r r r rT ,r pn r r 

q n n 

TACATTGTAG 

GAGCGGAAAG 

AATGTCGGAG 

CGGGCCGCGG 

%p» ^p* ^p 1 ^pT ^p* Np W\# ^J^pf 

ATGACGTCAG 

pi* v**^ w pk ^pppr**^ 

GGGGGAGCCG 

J D W 

CGCCGCGCGG 

W ^PF ^PP ^PV W ^PP ^P« ^ppl ^pp 

CGGCGGCGGC 

^pi ^p* ^p^ ^■p* ^pf 1 ^p* ^p* ^pp* 

GGGCGGAGCA 

GCGGCCGCCC 

^p- W w W ^pT ^pi Np> 

GGCAGCAGCA 



CAGCAGCAGC 

AGCCGCCGCC 

O * ^P* ^P 1 ^Pl ^P* ^PP" ^PPf ^BP ^PP 1 ^PP 1 

TCCGCAGCCC 

CAGCGGCAGC 

Vp* * *^P* Vp» ^P* ^pf Sprpn^w* ^pP 

AGCACCCGCC 

pT*V ^^P^Pa^pf ^p^ ^pf *^ 


i nan 

CGGCGCACAC 

GGC CGGAGGA 

CGGCGGGCCC 

^P> ^kP> W ^tP> ^kpf ^kpp ^Ipr ^p* ^pP 

GGCGCCGCCT 

^pT^P»%p*Npf ^p>^p>Vp* W^p» 4p 

CCACCTCGGC 

\pr4ipi W ^p» pV VpN«7Wv 


i n An 

J — L *i \J 

GCGACGGTCG 

GGGAGCGCAG 

GCCTCTGCCC 

^Pf ^pr Pta ^pv ^pt ^pf ™» ^p> 

AGTCCTGAAG 

TGATGCTGGG 

ACAGT (2 GTGR 

i 5nn 

AATCTGTGGG 

TTGAGGCTTC 

CAAACTTCCT 

^*p* ™p* pjp* pi^p* *a ^pi ^p* ^p* ^p* 

GGGAAGGACG 

GTGAGTGTCC 


t 9 fin 

CCCCCCTTCA 

CCCCCTCGCG 

^pr ^bi BPi ^p- ^bi ^b> ^bf 

ACCCCCTCCT 

CTCTCCTCCC 

CTCCCCCCTG 

Vp^wVw^p A. pip 

J. J U 

GTGACCCGCC 

^P* PB* ^^#> P> ^P» ^PP 1 ^P> ^P 1 

CCCTCGAGGG 

GCAGANATGC 

TATCGTTTGC 

TGGGTTGCGG 


lion 

TGCCCACACC 

^» ^BP 1 ^^Bk PI ^PP>Bi * ^PP 1 ^P* 

TACCCCGTGC 

PBr-P> Pfc-^p* ^P* PB) 

GTGCGTGAGT 

GTGCGTCACA 

CTCCTGGCCA 

Xp* «V Vp* W pW v#> 

CTGAPPTGPP 


TCTCCCCTCC 

TCCTGTGTGT 

GTATATCTCC 

TAGGACAGAA 

TTGGACGAAA 

GTTTCAAGGA 

1500 

GTTTGGGAAA 

ACCGCGAAGT 

CATGGGGCTC 

TGTCGGGAAG 

GTGAGTCCAG 

CCCCCCTGAT 

1560 

GGAGTTTGTA 

CAAACCCCTG 

GGAAGTTTCA 

TTGACAGTTC 

ACTGGGACCG 

GGAACATCAG 

1620 

CCCACCATAC 

CGACTCCCCG 

ACTCCCCGTG 

CCTGCGAAGA 

TGCTGCCTGA 

GGAGGGAGGG 

1680 

AGGGGGCAGA 

GCGCTTGGAA 

AGTTTGGTTT 

GGGGGCCTCC 

TGTAATGAGA 

GCGTCCGGAA 

1740 

TCCTTCTGTG 

ACCAGGCAGG 

AGCAGCATTA 

TTGGTGATGA 

GCGCTGGGAA 

CCGGCGGGAA 

1800 

GTTTAACATA 

GATCTCTGCA 

TTTCTGACCT 

CCTTACGGAG 

AAACAGGAGT 

AGAGGAAGGA 

1860 

ATTC 






1864 

(2) INFORMATION FOR SEQ ID NO: 3: 






50 

(2) i: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 300 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vi) ORIGINAL SOURCE: 
60 (A) ORGANISM: homo sapiens 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CAAGCAGAAA GGGGGCTGCA AAGCTGCCTG CCTAGGGCTA CGTTTCCTGG CAAAACTTCC 
GAAAGCCATT TCTCCAAAAG AAGGTCTAGA AGAGGAGGAG GAGGAGGAGA AGGAGGAGGA 
GGAGGAGGAG CAGCAGCAGC AGCAGCAGCA GCAGCAGCAG CAGCAGCAGC AGCAGCAGCA 
GCATGAAAGA GCCCCACTTG GAAGGCGGTT TGGATTTTAT TTGTGTGTTT TGTGGATTCT 
TTTTATTTTG CTTTACAAAT GCATCTTACA CCAAACTCAT CTGGCATTAA AAATGAATTC 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer" 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
CAGCAGCAGC AGCAGCAGCA GCAGCAGCAG 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer" 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
TTTTTTGTTA catxcxaqqa QCG 

(2) INFORMATION FOR SEQ ID NO : 6 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "Primer" 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
CACTTCAGGA CTGGGCAGAG 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "Primer" 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapienB 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GCCAGATGAG TTTGGTGTAA GAT 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer" 
(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: homo sapiens 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
AAGCCATTTC TCCAAAAGAA GGTC 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer" 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGACTCTTTC CCCCTTTTTT TTGTTACATT GTACCAGGAG CGGAAAGAAT GTCGGAGCGG 
GCCGCGGATG ACGTCAGGGG GGAGCCGCGC CGCGCGGCGG CGGCGGCGGG CGGAGCAGCG 
GCCGCCCGG 

(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 192 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "Primer" 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CCGCCGCCTC CGCAGCCCCA GCCGCAGCAG CACCCGCCAC CGCCGCCACG GCGCACACGG 
CCGGAGGACG GCGGGCCCGG CGCCGCCTCC ACCTCGGCCG CCGCAATGGC GACGGTCGGG 
GAGCGCAGGC CTCTGCCCAG TCCTGAAGTG ATGCTGGGAC AGTCGTGGAA TCTGTGGGTT 
GAGGCTTCCA AA 
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(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: N-terminal 
10 <vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ser Glu Arg Ala Ala Asp Asp Val Arg Gly Glu Pro Arg Arg Ala 
15 10 15 

Ala Ala Ala Ala Gly Gly Ala Ala Ala Ala Arg 

20 25 

20 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 12 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Ser Glu Arg Ala Ala Asp Asp Val Arg Gly Glu Pro Arg Arg Ala 
15 10 15 

Ala Ala Ala Ala Gly Gly Ala Ala Ala Ala Arg Gin Gin Gin Gin Gin 

20 25 30 

Gin Gin Gin Gin Gin Pro Pro Pro Pro Gin Pro Gin Arg Gin Gin His 
35 40 45 

40 Pro Pro Pro Pro Pro Arg Arg Thr Arg Pro Glu Asp Gly Gly Pro Gly 

50 55 60 

Ala Ala Ser Thr Ser Ala Ala Ala Met Ala Thr Val Gly Glu Arg Arg 
65 70 75 80 

Pro Leu Pro Ser Pro Glu Val Met Leu Gly Gin Ser Trp Asn Leu Trp 

85 90 95 
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Val Glu Ala Ser Lys Leu Pro Gly Lys Asp Glu Asp Arg lie Gly Arg 

100 105 110 

Lys Phe Gin Gly Val Trp Glu Asn Arg Glu Val Met Gly Leu Cys Arg 
115 120 125 

Glu 

10 

(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1002 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vi) ORIGINAL SOURCE: 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

GAATTCAAGT TCCCCAGGCG CGCACAGTGT GACTTCCAAT TGCGGGTCGG GGGCACACAT 60 

GTGGGCGGCG TGGCGTGCAC ACGGTACTTC GTCCTGACAC CTGGGCAGAC ACCTGGGAGG 120 

CACTTCCCCG CTAGCCCAAG GTTCCCTGCA CGCCCGGAGT CCGCTCTGCG GCGGCTTCCC 180 

ATTCATGCTT TTGACAACTC TGCGGCCGCC CGCAAGCCGA GGGCAAAGTG CCCCCTGCAC 240 

30 CAGCCTCTCC CGCGCTGCCC TGGGGCCGGC CGGCCGGCTC CTCCATAGGT GGCTGCATTT 300 

CCGACTTCGC CCTGGCTCCA GTCCGGGGGC TTGACGTGCA AACTTCGCCG GAGCGAGCTG 360 

AGAGGGAGGG ACCGGCAAGT GGGAGGAGGC GGCGGGAGGC GTCTCCGCTT AAGGGAG CCG 420 

GCCGGGTGCC GCCGCTCGGA CGGACGCGCG GACGGAAGGA AGGAGCGGGG CAGCCGGGCC 480 

GGGCCCGGGG ATGCAAGCGG CCGAAGGGTG GGCAGCTGGA GGTCCTGGGG TGCGGCTCGG 540 

4 0 GCTTCCCCGC GCGGGCTGCC ATGGTGGGGC GCGGGGTTGG AGCCGGGCCG CTCCGGCGCT 600 

GGCCTCCGCG CCAGGTCCTC TGAGCAGAAG CAGGCAGGGG ACCCAGCGCC GCGGTGGCGG 660 

GCCGCCTGCT GCCCGTCCCC TCCCTCGGGC GGCCGCGGGA GTCGAAAGCG AAAGCTAGC C 72 0 

CGCGCCGCGA CTTGAGC CCG GGGCGGGGGT GGCCTTGAGG AGGCGGGCTC GGGGGGCTGG 7 80 

GCGGCCATGG GGGCGCTGTC AGCGTGCCCC ACCCGGTCCG CGGGCCGCGC ACGCCGCCGG 84 0 

50 AACTCCCTGG CGCCTCCTTA AAAAACGGCC CCCGCGCGAC TCTTTCCCCC TTTTTTTTGT 90 0 

TACATTGTAG GAGCGGAAAG AATGTCGGAG CGGGCCGCGG ATGACGTCAG GGGGGAGCCG 960 

CGCCGCGCGG CGGCGGCGGC GGGCGGAGCA GCGGCCGCCC GG 1002 


(2) INFORMATION FOR SEQ ID NO: 14: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 832 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

10 'CCGCCGCCTC CGCAGCCCCA GCGGCAGCAG CACCCGCCAC CGCCGCCACG GCGCACACGG 60 
CCGGAGGACG GCGGGCCCGG CGCCGCCTCC ACCTCGGCCG CCGCAATGGC GACGGTCGGG 120 
GAGCGCAGGC CTCTGCCCAG TCCTGAAGTG ATGCTGGGAC AGTCGTGGAA TCTGTGGGTT 180 
GAGGCTTCCA AACTTCCTGG GAAGGACGGT GAGTGTC CAC GCCCTCCTCC CCCCTTCACC 240 
CCCTCGCGAC CCCCTCCTCT CTCCTCCCCT CCCCCCTGCC CCCCTCCTGT GACCCGCCCC 300 

20 CTCGAGGGGC AGANATGCTA TCGTTTGCTG GGTTGCGGAA CGCGGAGGTG CCCACACCTA 360 

CCCCGTGCGT GCGTGAGTGT GCGTCACACT CCTGGCCACT GACCTGCCTC TCCCCTCCTC 420 
CTGTGTGTGT ATATCTCCTA GGACAGAATT GGACGAAAGT TTCAAGGAGT TTGGGAAAAC 48 0 

CGCGAAGTCA TGGGGCTCTG TCGGGAAGGT GAGTCCAGCC CCCCTGATGG AGTTTGTACA 540 
AACCCCTGGG AAGTTTCATT GACAGTTCAC TGGGACCGGG AACATCAGCC CACCATACCG 600 

3 0 ACTCCCCGAC TCCCCGTGCC TGCGAAGATG CTGCCTGAGG AGGGAGGGAG GGGGCAGAGC 660 

GCTTGGAAAG TTTGGTTTGG GGGCCTCCTG TAATGAGAGC GTCCGGAATC CTTCTGTGAC 720 
CAGGCAGGAG CAGCATTATT GGTGATGAGC GCTGGGAACC GGCGGGAAGT TTAACATAGA 780 
TCTCTGCATT TCTGACCTCC TTACGGAGAA ACAGGAGTAG AGGAAGGAAT TC 832 
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What is Claimed is: 

1. A method for identifying individuals at risk for developing spinocerebellar 
ataxia type 7 comprising the step of: 

anal^ing the CAG repeat region of a spinocerebellar ataxia type 7 gene to 
5 detect CAG repeats in the CAG repeat region, wherein individuals at risk for 

developing spinocerebellar ataxia type 7 have at least about 30 CAG repeats in 
the CAG repeat region. 

2. The method of claim 1 wherein individuals at risk for developing 

1 0 spinocerebellar ataxia type 7 have at least about 37 CAG repeats in the CAG repeat 
region* 


3. The method of claim 1 wherein individuals at risk for developing 
spinocerebellar ataxia type 7 have at least about 38 CAG repeats in the CAG repeat 

1 5 region. 

^ 

4. The method of Claim 1 wherein the analyzing step comprises the steps of: 
performing a polymerase chain reaction with oligonucleotide primers capable 
of amplifying the CAG repeat region located within the spinocerebellar ataxia 

20 type 7 gene; and 

detecting amplified DNA fragments containing the CAG repeat region. 

5. The method of claim 4 wherein the oligonucleotide primers are selected from 
the region of SEQ ID NO:9, that region corresponding to 

25 CGACTCTTTCCCCC1 1111 ITTGTTACATTGTACCAGGAGCGGAAAGAATG 
TCGGAGCGGGCCGCGGATGACGTCAGGCKKjGAGCCGCGCCGCGCGG(XK} 
CGGCGGCGGGCGG AGCAGCGGCCGCCCGG and from the region of SEQ ID 
NO: 1 0, that region corresponding to 

CCGCCGCCTCCGCAGCCCCAGCCGCAGCAGCACCCGCCACCGCCGCCACG 
30 GCGCACACGGCCGGAGGACGGCGGGCCCGGCGCCGCCTCCACCTCGGCC 

<k:cgcaatggcgacgotcck5ggagcck:aggcctctgcccagtcctgaagt 
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GATGCTGGOACAGTCGTOOAATCTGTC3GGTTGAGGCTTCCAAA, both 
sequences from SEQ ID NO:L 

6\ The method of claim 4 wherein the oligonucleotide primers are SEQ ID NO:5 
5 and SEQ ID NO:6. 

7. The method of claim 4 wherein the oligonucleotide primers are selected from 
the region of SEQ ID NO:13. that region corresponding to nucleotides from 1 to 1 002, 
and from the region of SEQ ID NO: 14, that region corresponding to nucleotides from 

10 1033 to 1864, both sequences from SEQ ID NO;2. 

8. A kit for detecting whether or not an individual is at risk for developing 
spinocerebellar ataxia type 7 comprising oligonucleotides selected according to Claim 
5. 

15 

9. A method for detecting the presence of a DNA molecule located within an 
affected allele of the SCA7 gene comprising; 

(a) treating separate complementary strands of a DNA molecule containing a 
CAG repeat region of the SCA7 gene with a molar excess of two 

20 oligonucleotide primers; 

(b) extending the primers to form complementary primer extension products 
which act as templates for synthesizing the desired molecule containing 
the CAG repeat region; 

(c) detecting the molecule so amplified; and 

25 (d) analyzing the amplified DNA molecule for a CAG repeat region 

characteristic of the SCA7 disorder. 

i 

1 0. The method of claim 9 wherein the step of analyzing comprises analyzing for a 
(CAG) fl region wherein n is at least about 38, 


30 


II. The method of claim 9 wherein the step of analyzing comprises analyzing for a 
(CAG) D region wherein n is at least about 37, 
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12. Th method of claim 9 wherein the step of analyzine comprises analyzing for a 
(CAG) n region wherein n is at Jeast about 30. 

13. A method for detecting the presence of a DNA molecule containing & CAO 
repeat region of the SCA7 gene comprising: 

(a) digesting genomic DNA with a restriction endonuclease to obtain DNA 
fragments; 

(b) probing said DNA fragments \inder hybridizing conditions with a 
detectably labeled gene probe, which hybridizes to a nucleic acid 
molecule containing a CAG repeat region of an isolated SCA7 gene 
having at least about 1 1 nucleotides; 

(c) detecting probe DNA which has hybridized to said DNA fragments; 
and 

(d) analyzing the DNA fragments for a CAG repeat region characteristic of 

15 the normal or affected forms of the SCA7 gene, 

> 

14. The method of claim 1 3 wherein the step of analyzing comprises analyzing for a 
(CAG) n region wherein n is less than about 19, 

20 1 5. The method of claim 13 wherein the step of analyzing comprises analyzing for a 
(CAG) region wherein n is less than about 1 5. 


25 


1 6, The method of claim 1 3 wherein the step of analyzing comprises analyzing for 
(CAG) fl region wherein n is less than about 5. 


a 

r 


17, The method of claim 13 wherein the step of analyzing comprises analyzing for a 
(CAGX region wherein n is at least about 38. 

18. The method of claim 13 wherein the step of analyzing comprises analyzing for a 
3 0 (CAG) a region wherein n is at least about 3 7. 
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19. The method of claim 13 wherein the step of analyzing comprises analyzing for a 
(CAG) n region wherein n is at least about 30, 

20. The method of claim 13 wherein the DNA probe is selected &om the region of 
5 SEQ ID NO;9, that region corresponding to 

CGACTCTTTCCCCCTTITiT^ 

ATGTCGOAGCGGGCCGCGGATGACGTCAGGOOGGAGCCGCGCCCCG 
CGGCGGCGGCGGCGGGCGGAGCAGCGGCCGCCCGG or from the region 
of SEQ ID NO: 10, that region corresponding to 

10 CCGC CGCCTCCGCAGCCCCAGCCGCAGCAGCACCCGCCACCGCCGC 

CACGGCGCACACGGCCGGAGGACGGCGGGCCCGGCGCCGCCTCCAC 

CTCGGCCGCCGCAATGGCGACGGTCGGGGAGCGCAGGCCTCTGCCC 

AGTCCTGAAGTGATGCTCKK>ACAGTCGTGGAATCTGTCKX}TTGACKK: 
TTCCAAA, both sequences from SEQ ID NO:l. 

15 

21. A method for determining whether an individual is at risk: for developing 
spinocerebellar ataxia type 7 comprising the step of: 

analyzing the CAG repeat region of a spinocerebellar ataxia type 7 
gene wherein individuals who are not at risk for developing spinocerebellar 
20 ataxia type 7 nave less than 1 9 CAG repeats in the CAG repeat region. 

22. The method of claim 2 1 wherein individuals who are not at risk for developing 
spinocerebellar ataxia type 7 have less than about 15 CAG repeats in the CAG 
repeat region. 


25 


23. The method of claim 21 wherein individuals who are not at risk for developing 
spinocerebellar ataxia type 7 have less than about 5 CAG repeats in the CAG 
repeat regioa 
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24. A nucleic acid molecule containing a CAG repeat region of an isolated 

spinocerebellar ataxia type 7 (SCA7) gene, said gene located within the short 
arm of chromosome 3. 

5 25. The nucleic acid molecule of claim 24 corresponding to the entire SCA7 gene. 

26. The nucleic acid molecule of claim 24 wherein the SCA7 gene comprises SEQ 
ID NO:9 followed by a polyglutamine'region. 

1 0 27. The nucleic acid molecule of claim 24 wherein the SC A7 gene comprises SEQ 
IB NO: 1 3 followed by a polyglutamine region. 

28. An isolated DNA fragment comprising bases 1-128 of SEQ IDNO:l. 

15 29. An isolated DNA fragment comprising bases 286-476 of SEQ ID NO; 1 . 

30. An isolated DNA fragment comprising bases 1 -1 28 of SEQ ID NO: 1 in a 
vector* 

20 31. A polypeptide comprising amino acids encoded by the nucleic acid of SEQ ID 
NO:l. 

32. An isolated DNA fragment comprising bases 1 -128 of SEQ ID NO: 1 and further 
comprising a CAG repeat region. 

25 

33. An oligonucleotide comprising at least 15 nucleotides from SEQ ID NO: 9. 

34. An oligonucleotide comprising at least 15 nucleotides from SEQ ID NO:10. 
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35. An isolated oligonucleotide that hybridi Tes to a nucleic acid molecule 

containing a CAG repeat region of an isolated SC A7 gene; said oligonucleotide 
having at least about 1 1 nucleotides. 

5 36. An isolated recombinant vector comprising the nucleotides of SEQ ID NO: 1 
operatively Jinked to heterologous vector sequences. 

37. The isolated recombinant vector according to Claim 36 wherein the vector is 
capable of expressing a polypeptide comprising amino acids 1 to 27 of SEQ ID 

10 NO: 1 1 followed by a polyglutamine repeat region. 

38. An isolated nucleic acid fragment encoding the polypeptide for spinocerebellar 
ataxia type 7, wherein the polypeptide comprises amino acids 1 to 27 of SEQ ID 
NO:l 1 followed by a polyglutamine repeat region. 

15 

39. Cells containing the vector of Claim 36. 

40. An isolated DNA fragment comprising bases 922-1002 ofSBQ ID NO:2. 
20 41. An isolated DNA fragment comprising bases 1033-1864 of SEQ ID NO;Z 


42. An isolated DNA fragment comprising bases 922-1 002 of SEQ ID NO:2 in a 
vector. 

25 43 . A polypeptide comprising amino. acidVencoded by the nucleic acid of SEQ ID 
NO:2. 

44. An isolated DNA fragment comprising bases 922-1002 of SEQ ID NO:2 and 
further comprising a CAG repeat region. 

30 

45. An oligonucleotide comprising at least 15 nucleotides from SEQ ID NO: 13. 
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46. An oligonucleotide comprising at least 1 5 nucleotides from SEQ ED NO: 1 4. 

47. An isolated recombinant vector comprising the nucleotides of SEQ ID NO:2 
operatively linked to heterologous vector sequences. 

5 

48. The isolated recombinant vector according to Claim 47 wherein the vector is 
capable of expressing a polypeptide comprising amino acids 1 to 27 of SEQ ID 
NO: 12 followed by a polyglutamine repeat region. 

1 0 49. Cells containing the vector of Claim 47. 

50. An isolated nucleic acid fragment encoding the polypeptide for spinocerebellar 
ataxia type 7, wherein the polypeptide comprises amino acids 1 to 27 of SEQ ID 
NO: 12 followed by a polyglutamine repeat region. 


15 


20 


25 


51. An isolated recombinant vector comprising the nucleotides of SEQ ID NO:2 
operatively linked to heterologous vector sequences. 

52. A protein encoded by the SCA7 gene having therein a glutamine repeat region. 

53. The protein of claim 52 having a molecular weight of about 95-1 08 kD. 

54. The protein of claim 52 wherein the protein comprises the amino acids 1 -27 of 
SEQ ID NO:l 1 followed by a polyglutamine region, 

55. An antibody to a protein encoded by DNA containing a CAG repeat region of 
the SCA7 gene. 

56. A method for detecting the SCA7 disorder comprising: 
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(a) contacting an antibody to a protein encoded by the SCA7 gene with a 
biological sample containing antigenic protein to form an antibody- 
antigen complex; 

(b) , isolating the antibody-antigen complex; and 

5 (c) sequencing the antigen portion of the antibody-antigen complex using 

amino acid sequencing techniques. 


57. A method for identifying expanded repeats from genomic DNA comprising: 
10 a, fractionating of a population of DNA fragments and detecting the 

fraction that contains an expanded repeat; 

b. cloning the DNA fragments contained in the fraction of DNA that 
contains an expanded repeat; and 

c. identifying the clones that contain the expanded repeat. 

15 

58. The method of claim 57 wherein the fractionation step comprises: 

a. digesti ng genomic DNA with a restriction enzyme to obtain DNA 
fragments; 

b resolving the DNA fragments by gel electrophoresis and dividing into 
20 fractions on the basts of size; and 

c. detecting the presence of an expanded repeat in each size fraction. 


59. The method of claim 58 wherein the extended repeat comprises a (CAG) n 
region. 

25 

60. The method of claim 59 wherein n is greater than 20. 


61 . The method of claim 59 wherein n Is greater than 30. 


30 62. A method for identifying expanded trinucleotide repeats from genomic DNA 
^ comprising rjerforrhing RED analysis on a sample of genomic DNA wherein the 
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rate of temperature change from the denaturation temperature is decreased and 
wherein the ligation buffer contains forraamide. 

63. The method of claim 62 wherein the rate of temperature change from the 
5 denaturation temperature is decreased to 2 seconds per degree. 

64. The method of claim 62 wherein the ligation buffer contains 4% formamide. 

65. The method of claim 62 wherein the expanded trinucleotide repeat consists of 
10 CAG 
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CAAGCAGAAAGGGGGCTGCAAAGCTGCCTGCCTAGGGCTACGTTTCCTGGCAAAACTTCC 60 
GAAAGC<^TTTCTCCAAAAGAAGGTCTAGAAGAGGAGGAGGAGGAG6AGAAGGAGGAGGA 120 
GGAGGAGGA GCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC A GCAGCA 180 
GCATGAAAGAGCCCCACTTGGAAGGCGGTTTGGATTTTATTTGTGTGTTTTGTGGATTCT 240 
TTTTATTTTGCTTTACAAATGCATCTTACACCAAACTCATCTGGCATTAAAAATGAATTC 300 
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12345678 



CGACTCTTTCCCCCTTTTTTTT6TTACATTGTAGGAGCGGAAAGAATGTCGGAGCGGGCC 6 0 
GCGGATGACGTCAGGGGGGAGCCGCGCCGCGCGGCGGCGGCGGCGGGCGGAGCAGCGGCC 120 
GCCCG GCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG 180 
CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG 240 
CAGCAGCAGCAGCAGCAGCAGCAGCASCAGCAGCAGCAGCAGCAG CCGCCGCCTCCGCAG 300 
CCCCAGCGGCAGCAGCACCCGCCACCGCCGCCACGGCGCACACGGCCGGAGGACGGCGGG 360 
CCCGGCGCCGCCTCCACCTCGGCCGCCGCAATGGCGACGGTCGGGGAGCGCAGGCCTCTG 420 
CCCAGTCCTGAAGTGATGCTGGGACAGTCGTGGAATCTGTGGGTTGAGGCTTCCAAA 
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